I need to query the names of "available" sub elements of an element node in DOM.
For example if schema says "There can be age, name, occupation elements under person element." then I wanna function like this,
import org.w3c.dom.Element;
Element person_element;
String[] names_of_available_sub_element =
get_available_sub_element_names(person_element);
which makes
names_of_available_sub_element == {"age", "name", "occupation"}.
How can I implement this function?
This isn't easy, but it can be done if you're prepared to put a lot of work in.
There are a number of approaches to getting information from an XSD schema. You could try and process the XSD source code, but I wouldn't recommend that, because there are so many things you have to take into account (wildcards, substitution groups, types derived by restriction and extension, and so on). A better approach is to use some kind of API that gives you access to the information in digested form. For that, some possible suggestions are:
(a) Xerces gives you a Java API providing programmatic access to the compiled schema.
(b) Saxon gives you two possibilities: (i) the SCM file, which is an XML representation of the compiled schema, and (ii) an XPath API giving programmatic access to the compiled schema using extension functions.
Do remember that knowing you're at a "person" element isn't (in the general case) enough to determine what the permitted children are. That's because there can be global and local elements using the name "person", but with different types. Whether this is a problem in your case depends on what you are trying to achieve, which you haven't really explained in much detail.
Related
My question more specificity is this:
I want users on multiple front ends to see the "Type" of a database row. Let's say for ease that I have a person table and the types can be Student, Teacher, Parent etc.
The specific program would be java with hibernate, however I doubt that's important for the question, but let's say my data is modelled in to Entity beans and a Person "type" field is an enum that contains my 3 options, ideally I want my Person object to have a getType() method that my front end can use to display the type, and also I need a way for my front end to know the potential types.
With the enum method I have this functionality but what I don't have is the ability to easily add new types without re-compiling.
So next thought is that I put my types in to a config file and simply story them in the database as strings. my getType() method works, but now my front end has to load a config file to get the potential types AND now there's nothing to keep them in sync, I could remove a type from my config file and the type in the database would point to nothing. I don't like this either.
Final thought is that I create a PersonTypes database table, this table has a number for type_id and a string defining the type. This is OK, and if the foreign key is set up I can't delete types that I'm using, my front end will need to get sight of potential types, I guess the best way is to provide a service that will use the hibernate layer to do this.
The problem with this method is that my types are all in English in the database, and I want my application to support multiple languages (eventually) so I need some sort of properties file to store the labels for the types. so do I have a PersonType table the purely contains integers and then a properties file that describes the label per integer? That seems backwards?
Is there a common design pattern to achieve this kind of behaviour? Or can anyone suggest a good way to do this?
Regards,
Glen x
I would go with the last approach that you have described. Having the type information in separate table should be good enought and it will let you use all the benefits of SQL for managing additional constraints (types will be probably Unique and foreign keys checks will assure you that you won't introduce any misbehaviour while you delete some records).
When each type will have i18n value defined in property files, then you are safe. If the type is removed - this value will not be used. If you want, you can change properties files as runtime.
The last approach I can think of would be to store i18n strings along with type information in PersonType. This is acceptable for small amount of languages, altough might be concidered an antipattern. But it would allow you having such method:
public String getName(PersonType type, Locale loc) {
if (loc.equals(Locale.EN)) {
return type.getEnglishName();
} else if (loc.equals(Locale.DE)){
return type.getGermanName();
} else {
return type.getDefaultName();
}
}
Internationalizing dynamic values is always difficult. Your last method for storing the types is the right one.
If you want to be able to i18n them, you can use resource bundles as properties files in your app. This forces you to modify the properties files and redeploy and restart the app each time a new type is added. You can also fall back to the English string stored in database if the type is not found in the resource bundle.
Or you can implement a custom ResourceBundle class that fetches its keys and values from the database directly, and have an additional PersonTypeI18n table which contains the translations for all the locales you want to support.
You can use following practices:
Use singleton design pattern
Use cashing framework such as EhCashe for cashe type of person and reload when need.
I'm working with three separate classes: Group, Segment and Field. Each group is a collection of one or more segments, and each segment is a collection of one or more fields. There are different types of fields that subclass the Field base class. There are also different types of segments that are all subclasses of the Segment base class. The subclasses define the types of fields expected in the segment. In any segment, some of the fields defined must have values inputted, while some can be left out. I'm not sure where to store this metadata (whether a given field in a segment is optional or mandatory.)
What is the most clean way to store this metadata?
I'm not sure you are giving enough information about the complete application to get the best answer. However here are some possible approaches:
Define an isValid() method in your base class, which by default returns true. In your subclasses, you can code specific logic for each Segment or FieldType to return false if any requirements are missing. If you want to report an error message to say which fields are missing, you could add a List argument to the isValid method to allow each type to report the list of missing values.
Use Annotations (as AlexR said above).
The benefit of the above 2 approaches is that meta data is within the code, tied directly to the objects that require it. The disadvantage is that if you want to change the required fields, you will need to update the code and deploy a new build.
If you need something which can be changed on the fly, then Gangus suggestion of Xml is a good start, because your application could reload the Xml definition at run-time and produce different validation results.
I think, the best placement for such data will be normal XML file. And for work with such data the best structure will be also XMLDOM with XPATH. Work with classes will be too complicated.
Since java 5 is released this kind of metadata can be stored using annotations. Define your own annotation #MandatoryField and mark all mandatory fields with it. Then you can discover object field-by-field using reflection and check whether not initiated fields are mandatory and throw exception in this case.
Until now, I've been handling extensions by defining a placeholder element that has "name" and "value" attributes as shown in the below example
<root>
<typed-content>
...
</typed-content>
<extension name="var1" value="val1"/>
<extension name="var2" value="val2"/>
....
</root>
I am now planning to switch to using xsd:any. I'd appreciate if you can help me choose th best approach
What is the value add of xsd:any over my previous approach if I specify processContents="strict"
Can a EAI/ESB tool/library execute XPATH expressions against the arbitrary elements I return
I see various binding tools treating this separately while generating the binding code. Is this this the same case if I include a namespace="http://mynamespace" and provide the schema for the "http://mynamespace" during code gen time?
Is this WS-I compliant?
Are there any gotchas that I am missing?
Thank you
Using <xsd:any processContents="strict"> gives people the ability to add extensions to their XML instance documents without changing the original schema. This is the critical benefit it gives you.
Yes. tools than manipulate the instances don't care what the schema looks like, it's the instance documents they look at. To them, it doesn't really matter if you use <xsd:any> or not.
Binding tools generally don't handle <xsd:any> very elegantly. This is understandable, since they have no information about what it could contain, so they'll usually give you an untyped placeholder. It's up the the application code to handle that at runtime. JAXB is particular (the RI, at least) makes a bit of a fist of it, but it's workable.
Yes. It's perfectly good XML Schema practice, and all valid XML Schema are supported by WS-I
<xsd:any> makes life a bit harder on the programmer, due to the untyped nature of the bindings, but if you need to support arbitrary extension points, this is the way to do it. However, if your extensions are well-defined, and do not change, then it may not be worth the irritation factor.
Regarding point 3
Binding tools generally don't handle
very elegantly. This is
understandable, since they have no
information about what it could
contain, so they'll usually give you
an untyped placeholder. It's up the
the application code to handle that at
runtime. JAXB is particular (the RI,
at least) makes a bit of a fist of it,
but it's workable.
This corresponds to the #XmlAnyElement annotation in JAXB. The behaviour is as follows:
#XmlAnyElement - Keep All as DOM Nodes
If you annotate a property with this annotation the corresponding portion of the XML document will be kept as DOM nodes.
#XMLAnyElement(lax=true) - Convert Known Elements to Domain Objects
By setting lax=true, if JAXB has a root type corresponding to that QName then it will convert that chunk to a domain object.
http://bdoughan.blogspot.com/2010/08/using-xmlanyelement-to-build-generic.html
I am using JAXB to generate XML from Java objects, it's a realtime, quite high message rate application and works fine most of the time. However occassionally and without any obvious clues as to why, I am getting duplicate namespace declarations in the generated XML. eg:
<UpdateRequest xmlns="http://xml.mycomp.com/ns/myservice"
xmlns="http://xml.mycomp.com/ns/myservice">
<field1>value</field1>
...
</UpdateRequest>
Has anyone seen this behaviour before?
Check if the xsd code of this class allow the creation of more than 1 instance of the repeated attribute. if so, you can avoid this repetitions setting the number of instances of the xmlns attribute for each UpdateRequest object.
If the problem is your code (maybe there is being created this attribute twice) and you have limited the number of instances of the attribute (as i said above), the program will show an error at runtime complaining that you are trying to insert an attribute already defined.
A solution might be available at this link.
here's the relevant section quoted verbatim from the above link that may be relevant for you:
Similar explicit inclusion of a schema
type in an instance document's element
occurs if you instantiate a JAXB
element using an object of some
(abstract) XML schema base type so
that the element would have the
element tag of the base type.
Second, avoid xs:anySimpleType since
this will also create multiple
references to the namespaces bound to
xsi and xs, and type attributes
containing the actual type. And you
lose JAXB's advantage of having typed
fields in your Java classes so that
you lose all the checks the Java
compiler might do, and for
unmarshalling you'll have to handle
all the conversions yourself.
Most projects have some sort of data that are essentially static between releases and well-suited for use as an enum, like statuses, transaction types, error codes, etc. For example's sake, I'll just use a common status enum:
public enum Status {
ACTIVE(10, "Active");
EXPIRED(11, "Expired");
/* other statuses... */
/* constructors, getters, etc. */
}
I'd like to know what others do in terms of persistence regarding data like these. I see a few options, each of which have some obvious advantages and disadvantages:
Persist the possible statuses in a status table and keep all of the possible status domain objects cached for use throughout the application
Only use an enum and don't persist the list of available statuses, creating a data consistency holy war between me and my DBA
Persist the statuses and maintain an enum in the code, but don't tie them together, creating duplicated data
My preference is the second option, although my DBA claims that our end users might want to access the raw data to generate reports, and not persisting the statuses would lead to an incomplete data model (counter-argument: this could be solved with documentation).
Is there a convention that most people use here? What are peoples' experiences with each and are there other alternatives?
Edit:
After thinking about it for a while, my real persistence struggle comes with handling the id values that are tied to the statuses in the database. These values would be inserted as default data when installing the application. At this point they'd have ids that are usable as foreign keys in other tables. I feel like my code needs to know about these ids so that I can easily retrieve the status objects and assign them to other objects. What do I do about this? I could add another field, like "code", to look stuff up by, or just look up statuses by name, which is icky.
We store enum values using some explicit string or character value in the database. Then to go from database value back to enum we write a static method on the enum class to iterate and find the right one.
If you expect a lot of enum values, you could create a static mapping HashMap<String,MyEnum> to translate quickly.
Don't store the actual enum name (i.e. "ACTIVE" in your example) because that's easily refactored by developers.
I'm using a blend of the three approaches you have documented...
Use the database as the authoritative source for the Enum values. Store the values in a 'code' table of some sort. Each time you build, generate a class file for the Enum to be included in your project.
This way, if the enum changes value in the database, your code will be properly invalidated and you will receive appropriate compile errors from your Continuous Integration server. You have a strongly typed binding to your enumerated values in the database, and you don't have to worry about manually syncing the values between code and the data.
Joshua Bloch gives an excellent explanation of enums and how to use them in his book "Effective Java, Second Edition" (p.147)
There you can find all sorts of tricks how to define your enums, persist them and how to quickly map them between the database and your code (p.154).
During a talk at the Jazoon 2007, Bloch gave the following reasons to use an extra attribute to map enums to DB fields and back: An enum is a constant but code isn't. To make sure that a developer editing the source can't accidentally break the DB mapping by reordering the enums or renaming then, you should add a specific attribute (like "dbName") to the enum and use that to map it.
Enums have an intrinsic id (which is used in the switch() statement) but this id changes when you change the order of elements (for example by sorting them or by adding elements in the middle).
So the best solution is to add a toDB() and fromDB() method and an additional field. I suggest to use short, readable strings for this new field, so you can decode a database dump without having to look up the enums.
While I am not familiar with the idea of "attributes" in Java (and I don't know what language you're using), I've generally used the idea of a code table (or domain specific tables) and I've attributed my enum values with more specific data, such as human readable strings (for instance, if my enum value is NewStudent, I would attribute it with "New Student" as a display value). I then use Reflection to examine the data in the database and insert or update records in order to bring them in line with my code, using the actual enum value as the key ID.
What I used in several occations is to define the enum in the code and a storage representation in the persistence layer (DB, file, etc.) and then have conversion methods to map them to each other. These conversion methods need only be used when reading from or writing to the persistent store and the application can use the type safe enums everywhere. In the conversion methods I used switch statements to do the mapping. This allows also to throw an exception if a new or unknown state is to be converted (usually because either the app or the data is newer than the other and new or additional states had been declared).
If there's at least a minor chance that list of values will need to be updated than it's 1. Otherwise, it's 3.
Well we don't have a DBA to answer to, so our preference is for option 2).
We simply save the Enum value into the database, and when we are loading data out of the database and into our Domain Objects, we just cast the integer value to the enum type.
This avoids any of the synchronisation headaches with options 1) and 3). The list is defined once - in the code.
However, we have a policy that nobody else accesses the database directly; they must come through our web services to access any data. So this is why it works well for us.
In your database, the primary key of this "domain" table does't have to be a number. Just use a varchar pk and a description column (for the purposes your dba is concerned). If you need to guarantee the ordering of your values without relying on the alphabetical sor, just add a numeric column named "order or "sequence".
In your code, create a static class with constants whose name (camel-cased or not) maps to the description and value maps to the pk. If you need more than this, create a class with the necessary structure and comparison operators and use instances of it as the value of the constants.
If you do this too much, build a script to generate the instatiation / declaration code.