How does one implement expensive queries in jxpath?

How does one implement expensive queries in jxpath? - java

Imagine that one were using JXPath as an access language into a tree that has certain nodes that represent collections that are impractically large or expensive to hold in memory - e.g.,
.../customers[id=12345]
where the customers are really in a database, and there are a bazillion of them. I don't need the full generality of all the queries that one could imagine - just a few kinds of well-indexed queries like this.
Is there a practical way to implement these using the customization capabilities of jxpath? If so, can you point me toward examples, relevant docs, etc?

Have a look at the JXPath User's Guide. you can create an extension function that would take the query as a parameter.
public static NodeSet getCustomers(String query){
List<Customer> l=getCustomersFromMyDatabase(query);
BasicNodeSet bns=new BasicNodeSet();
putCustomersIntoNodeSet(bns,l);
return bns;
}
Your xpath would then look like
getCustomers('id=123')
You can have also a first parameter of type ExpressionContext, that can give you the context object if you need it, etc.

Related

Is a DAO Only Meant to Access Databases?

I have been brushing up on my design patterns and came across a thought that I could not find a good answer for anywhere. So maybe someone with more experience can help me out.
Is the DAO pattern only meant to be used to access data in a database?
Most the answers I found imply yes; in fact most that talk or write on the DAO pattern tend to automatically assume that you are working with some kind of database.
I disagree though. I could have a DAO like follows:
public interface CountryData {
public List<Country> getByCriteria(Criteria criteria);
}
public final class SQLCountryData implements CountryData {
public List<Country> getByCriteria(Criteria criteria) {
// Get From SQL Database.
}
}
public final class GraphCountryData implements CountryData {
public List<Country> getByCriteria(Criteria criteria) {
// Get From an Injected In-Memory Graph Data Structure.
}
}
Here I have a DAO interface and 2 implementations, one that works with an SQL database and one that works with say an in-memory graph data structure. Is this correct? Or is the graph implementation meant to be created in some other kind of layer?
And if it is correct, what is the best way to abstract implementation specific details that are required by each DAO implementation?
For example, take the Criteria Class I reference above. Suppose it is like this:
public final class Criteria {
private String countryName;
public String getCountryName() {
return this.countryName;
}
public void setCountryName(String countryName) {
this.countryName = countryName;
}
}
For the SQLCountryData, it needs to somehow map the countryName property to an SQL identifier so that it can generate the proper SQL. For the GraphCountryData, perhaps some sort of Predicate Object against the countryName property needs to be created to filter out vertices from the graph that fail.
What's the best way to abstract details like this without coupling client code working against the abstract CountryData with implementation specific details like this?
Any thoughts?
EDIT:
The example I included of the Criteria Class is simple enough, but consider if I want to allow the client to construct complex criterias, where they should not only specify the property to filter on, but also the equality operator, logical operators for compound criterias, and the value.

DAO's are part of the DAL (Data Access Layer) and you can have data backed by any kind of implementation (XML, RDBMS etc.). You just need to ensure that the project instance is injected/used at runtime. DI frameworks like Spring/Guice shine in this case. Also, your Criteria interface/implementation should be generic enough so that only business details are captured (i.e country name criteria) and the actual mapping is again handled by the implementation class.
For SQL, in your case, either you can hand generate SQL, generate it using a helper library like Spring or use a full fledged framework like MyBatis. In our project, Spring XML configuration files were used to decouple the client and the implementation; it might vary in your case.
EDIT: I see that you have raised a similar concern in the previous question. The answer still remains the same. You can add as much flexibility as you want in your interface; you just need to ensure that the implementation is smart enough to make sense of all the arguments it receives and maps them appropriately to the underlying source. In our case, we retrieved the value object from the business layer and converted it to a map in the SQL implementation layer which can be used by MyBatis. Again, this process was pretty much transparent and the only way for the service layer to communicate with DAO was via the interface defined value objects.

No, I don't believe it's tied to only databases. The acronym is for Data Access Object, not "Database Access Object" so it can be usable with any type of data source.
The whole point of it is to separate the application from the backing data store so that the store can be modified at will, provided it still follows the same rules.
That doesn't just mean turfing Oracle and putting in DB2. It could also mean switching to a totally non-DBMS-based solution.

ok this is a bit philosophical question, so I'll tell what I'm thinking about it.
DAO usually stands for Data Access Object. Here the source of data is not always Data Base, although in real world, implementations are usually come to this.
It can be XML, text file, some remote system, or, like you stated in-memory graph of objects.
From what I've seen in real-world project, yes, you right, you should provide different DAO implementations for accessing the data in different ways.
In this case one dao goes to DB, and another dao implementation goes to object graph.
The interface of DAO has to be designed very carefully. Your 'Criteria' has to be generic enough to encapsulate the way you're going to get the data from.
How to achieve this level of decoupling? The answer can vary depending on your system, by in general, I would say, the answer would be "as usual, by adding an another level of indirection" :)
You can also think about your criteria object as a data object where you supply only the data needed for the query. In this case you won't even need to support different Criteria.
Each particular implementation of DAO will take this data and treat it in its own different way: one will construct query for the graph, another will bind this to your SQL.
To minimize hassling with maintenance I would suggest you to use Dependency Management frameworks (like Spring, for example). Usually these frameworks are suited well to instantiate your DAO objects and play good together.
Good Luck!

No, DAO for databases only is a common misconception.
DAO is a "Data Access Object", not a "Database Access Object". Hence anywhere you need to CRUD data to/from ( e.g. file, memory, database, etc.. ), you can use DAO.
In Domain Driven Design there is a Repository pattern. While Repository as a word is far better than three random letters (DAO), the concept is the same.
The purpose of the DAO/Repository pattern is to abstract a backing data store, which can be anything that can hold a state.

NoSQL Schemaless data and statically typed language

One of the key benefits of NoSQL data stores like MongoDB is that they're schemaless. With dynamically typed languages this seem to be a natural fit. You can receive some arbitrary JSON inputs, perform business logic on the known fields, and persist the whole thing without first having to define the object.
What if your choice of language is limited to the statically typed, say Java? How could I achieve the same level of flexibility?
A typical data flow like the following:
JSON Input
Serialize to Java Object to perform business logic
Deserialize into BSON to persist in Mongo
where the serialization to object step is necessary since you want to perform business logic with POJOs, not JSON strings. However, before I can serialize the input into objects, I must define it first. What if the input contains additional fields undefined in the object? While they may not be used in the business logic, I may still want to be able to persist them. I have seem implementations where the undefined fields are put into a map, but am not sure if that's the best approach. For one, the undefined fields may be complex objects as well.

Schemaless data doesn't necessarily mean structureless data; the fields are typically known in advance and some type-safe pattern can be applied on top of it to avoid the Magic Container anti-pattern But this is not always the case. Sometimes keys are entered by the user and cannot be known in advance.
I've used the Role Object Pattern several times to give coherence to a dynamic structure. I think it is well suited here for both cases.
The Role Object Pattern defines a way to access different views of an object. The canonical example being a User that can assume several roles such as Customer, Vendor, and Seller. Each of these views has different operations it can perform and can be accessed from any of the other views. Common fields are typically available at the interface level (especially userId(), or in your case toJson()).
Here's an example of using the pattern:
public void displayPage(User user) {
display(user.getName());
if (user.hasView(Customer.class))
displayShoppingCart(user.getView(Customer.class);
if (user.hasView(Seller.class))
displayProducts(user.getView(Seller.class));
}
In the case of data with a known structure, you can have several views bringing different sets of keys into cohesive units. These different views can read the json data on construction.
In the case of data with a dynamic structure, an authoritative RawDataView can have the data in it's dynamic form (ie. a Magic Container like a HashMap<String, Object>). This can be used to query the dynamic data. At the same time, type-safe wrappers can be created lazily and can delegate to the RawDataView to assist in program readability/maintainability:
public class Customer implements User {
private final RawDataView data;
public CustomerView(UserView source) {
this.data = source.getView(RawDataView.class);
}
// All User views must specify this
#Override
public long id() {
return data.getId();
}
#Override
public <T extends UserView> T getView(Class<T> view) {
// construct or look up view
}
#Override
public Json toJson() {
return data.toJson();
}
//
// Specific to Customer
//
public List<Item> shoppingCart() {
List<Item> items = (List<Item>) data.getValue("items", List.class);
}
// etc....
}
I've had success with both of these approaches. Here are some extra pointers that I've discovered along the way:
Have a static structure structure to your data as much as possible. This makes things a lot easier to maintain. I had to break this rule and use the RawDataView approach when working on a legacy system. You may also have to break it with dynamically-entered user data as mentioned above. In which case, use a convention for non-dynamic field names such as a leading underscore (_userId)
Have equals() and hashcode() implemented such that user.getView(A.class).equals(user.getView(B.class)) is always true for the same user.
Have a UserCore class that does all the heavy lifting of common code such as creating views; performing common operations (like toJson()) returning common fields (like userId()); and implementing equals() and hashcode(). Have all views delegate to this core object
Have an AbstractUserView that delegates to the UserCore and implements equals() and hashcode()
Use a type-safe heterogeneous container (like ClassToInstanceMap) constructing/caching views.
Allow the existence of a view to be queried. This can be done with either a hasView() method or by having getView return Optional<T>

You can always have a class which provides both:
easy access to attributes you know about and optional fallback cases to older formats (for example it can return "name" if it exists, or older case of "name.first" + "name.last" if it doesn't (or some similar scenario))
easy access to unknown elements simulating the map interface
Whether you do a full validation or not, whether you allow extra undefined attributes or not depends on what you want to achieve. But I think that creating an abstraction which allows you either way of accessing the data is the best solution.
Hopefully over time, you'll get to the stage where your schema is pretty much stable and messing directly with the attributes is not needed anymore.

This is not well solved in Java due to the lack of dynamic types. One way this can be solved is using Maps.
Map
The object can again be a Map of objects.
This is not an elegant way but works in Java. An example : SnakeYaml library for YAML allows traversal in this way.

Lucene custom scoring (Lucene 3.2) involves iterating through all documents in the index - fastest way?

I'm trying to implement a custom scoring formula in Lucene that has nothing to do with tf-idf (so changing just the similarity, for example, will not work).
In order to do this, I need to be able to take my custom Query and generate a score for every document stored in the index - not just the ones that match the terms in the query (since my scoring involves checking what are essentially synonyms, so even if a doc doesn't have the exact Terms, it could still produce a positive score). Is the best way to simply create an IndexReader and call Document d = reader.doc(i) for all docs (as described here), and then generate a score on the spot?
I've been looking around at Lucene's scoring internals, specifically various Scorer and Collector classes, and it appears that what happens (for Lucene 3.2) is a Weight provides a Scorer, which along with the Collector loops through all documents that match the query. Can I utilize this structure in some way, but again get a custom Scorer implementation to consider ALL documents?

If you decide to go for a custom scoring scheme, the proper way is to use a subclass of CustomScoreQuery with getCustomScoreProvider overridden to return your subclass of CustomScoreProvider. The CustomScoreQuery constructor requires a subquery. Here you will want to provide a fast native Lucene Query that will narrow down the result set as much as possible before going through your custom score calculation. You can also arrange to store any number of float values with each of your docs and make those accessible to your custom score provider. You will need to provide an appropriate ValueSourceQuery to the constructor of CustomScoreQuery for each such float value. See the Javadocs on these classes, they are well written. Unfortunately I don't have a Java snippet at hand.

As I understand Lucene, it stores (term, doc) pairs in its index, so that querying is implemented as
Get documents containing the query terms,
score/sort them.
I've never implemented my own scoring, but I'd look at IndexReader.termDocs first; it seems to implement step 1.

With IndexReader.termDocs you can iterate through a term's posting list, that is, all documents that contain that term. You could use this to provide your whole own query processing own top of Lucene, but then you won't be able to use any of Query, Similarity and stuff.
Also, if you are working with synonyms Lucene has some things in the contrib package. Another possible solution, don't know if you tried it, is to inject synonyms into the documents through a Analyzer (or other). That way you could return documents even if they don't have query terms.

How to avoid a large if-else statement in Java

I'm developing a framework in java which relies on a number of XML files with large number of parameters.
When reading the parameters from the XML file, I have to have a large if-else statement to decide what the parameters is and then call appropriate methods.
Is this normal? to have a large if-else statement?
I am thinking that there is a simple and neater way of doing this, e.g. Java XML mapping or Java Reflections? is this the answer? if so, can you please provide examples of how this is done so I don't have to rely on a large if-else statement?
Thanks!

You want to first create an interface:
public interface XMLParameterHandler {
public handle_parameter (String XMLData);
}
Next you want to create a map:
private Map<string, XMLParameterHandler> handlers;
...and initialize it with one of the relevant Map implementations:
this.handlers = new HashMap<>();
You need to implement the interface on a number of classes, one for each parameter you intend to handle. This is a good use of inner classes. Insert each of these implemented handerls into the map:
handlers.put ("Param1", new XMLParam1HandlerImpl());
handlers.put ("Param2", new XMLParam2HandlerImpl());
Then you can call the handler from the xml processing loop:
handlers.get (paramValue).handle_parameter(XmlData);

There is JAXB (http://en.wikipedia.org/wiki/Java_Architecture_for_XML_Binding) for mapping java class to xml.
But you can't map methods with it: you only can map attributes to xml file values (deserialize parameters from xml).

i recommend to use Map, that have parameter as key and xml entry as value(not whole xml)

Reflection would be one approach. Perhaps combined with a custom annotation on the target method to indicate which parameter to pass to that method. This is an advanced technique, though.
A more standard technique would be to use a map, where the key is the attribute name, and the value is an instance of an implementation of some interface you define, like AttributeHandler. The implementations then contain the code for each attribute. This involves writing a lot of little classes, but you can do them as anonymous classes to save space and keep the code inline.

a large if-else statement to decide what the parameters is and then call appropriate methods
You could instead use the Strategy design pattern, with one Strategy object per parameter, and use a map from the parameter name to the Strategy object to use. I've found this approach useful for even a moderately complicated application of XML.

It sounds to me as if you want a data-driven rule-based approach to writing your application, rather like you get in XSLT. One way of achieving this is to write it in XSLT instead of Java - XSLT, after all, was specifically designed for processing XML, while Java wasn't. If you can't do that, you could study how XSLT does it using rules and actions, and emulate this design in your Java code.

N functions with M parameters can always be implemented with a single function with M + 1 parameters.
If you need a big if then else statement to decide which method to dispatch to, then you can just add a parameter to your method and call a single method.
You shouldn't need an if-then-else statement to bind the parameter values.
If there is complex logic dependent on the particular parameter values, you might use a table driven approach. You can map various combinations of paramemter values into equivalence classes, then variouos equivalence class combinations into a row in a table with a unique id, then have a switch statement based on that unique id.

Java Programming - Spring and JDBCTemplate - Use query, queryForList or queryForRowSet?

My Java (JDK6) project uses Spring and JDBCTemplate for all its database access. We recently upgraded from Spring 2.5 to Spring 3 (RC1). The project does not use an ORM like Hibernate nor EJB.
If I need to read a bunch of records, and do some internal processing with them, it seems like there are several (overloaded) methods: query, queryForList and queryForRowSet
What should be the criteria to use one instead of the other? Are there any performance differences? Best practices?
Can you recommend some external references for further research on this topic?

I find that the standard way to access as list is via the query() methods rather than any of the other approaches. The main difference between query and the other methods is that you'll have to implement one of the callback interfaces (either RowMapper, RowCallbackHandler, or ResultSetExtractor) to handle your result set.
A RowMapper is likely what you'll find yourself using most of the time. It's used when each row of the result set corresponds to one object in your list. You only have to implement a single method mapRow where you populate the type of object that goes in your row and return it. Spring also has a BeanPropertyRowMapper which can populate the objects in a list via matching the bean property names to the column names (NB this class is for convenience not performance).
A RowCallbackHandler is more useful when you need your results to be more than just a simple list. You'll have to manage the return object yourself you are using this approach. I usually find myself using this when I need a map structure as my return type (i.e. for grouped data for a tree table or if I'm creating a custom cache based of the primary key).
A ResultSetExtractor is used when you want to control the iteration of the results. You implment a single method extractData that will be the return value of the call to query. I only find myself using this if I have to build some custom data structure that is more complex to build using either of the other callback interfaces.
The queryForList() methods are valuable in that you don't have to implement these callback methods. There are two ways use queryForList. The first is if you're only querying a single column from the database (for example a list of strings) you can use the versions of the method that takes a Class as an argument to automatically give you a list of only objects of those classes.
When calling the other implementations of queryForList() you'll get a list back with each entry being a map of for each column. While this is nice in that you are saved the expense of writing the callback methods, dealing with this data structure is quite unwieldy. You'll find yourself doing a lot of casting since the map's values are of type Object.
I've actually never seen the queryForRowSet methods used in the wild. This will load the entire result of the query into a CachedRowSet object wapped by a Spring SqlRowSet. I see a big downside in using this object in that if you're passing the SqlRowSet around to the other layers of your application, you're coupling those layers to your data access implementation.
You shouldn't see any huge performance differences between any of these calls except as I mentioned with the BeanPropertyRowMapper. If you're working with some complex manipulation of a large result set, you might be able to get some performance gains from writing an optimized ResultSetExtractor for your specific case.
If you want to learn more I would consult the Spring JDBC documentation and the JavaDoc for the classes I've mentioned. You can also take a look at some of the books on the Spring Framework. Though it's a bit dated Java Development with the Spring Framework has a very good section on working with the JDBC framework. Most of all, I would say just try writing some code with each method and see what works best for you.

Since you are in the wonderful Generics land, what you may really want to do is to use SimpleJdbcTemplate and use its query() methods for Lists of objects and queryForObject() for individual objects. Reasoning for this simply is that they're even easier to use than the ones in JdbcTemplate.

One small addition to the excellent answers above: additional methods, like queryForInt, queryForLong, queryForMap, queryForObject, etc. might seem like good options at times if you're running a simple query and expect a single row.
However, if you could get 0 or 1 rows back, the queryForList method is generally easier, otherwise you'd have to catch IncorrectResultSizeDataAccessException. I learned that the hard way.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.