How to read schema-less documents using Java from MongoDB

How to read schema-less documents using Java from MongoDB - java

MongoDB gives the ability to write documents of any structure i.e. any number and types of key/value pairs can be written. Assuming that I use this features that my documents are indeed schema-less then how do I manage reads, basically how does the application code ( I'm using Java ) manage reads from the database.

The java driver reads and writes documents as BasicBSONObjects, which implement and are used as Map<String, Object>. Your application code is then responsible for reading this map and casting the values to the appropriate types.
A mapping framework like Morphia or Spring MongoDB can help you to convert BSONObject to your classes and vice versa.
When you want to do this yourself, you could use a Factory method which takes a BasicBSONObject, checks which keys and values it has, uses this information to create an object of the appropriate class and returns it.

Related

Mapping neo4j ogm query results to java objects

I'm collecting infos from a neo4j db, but the values I return are picked out from multiple nodes, so what I'm basically returning is a table with some properties. For this example, let's say I return the properties color:String, name:String, count:String. I query these results using session.query(*QUERY*, queryParams).
Now, when I get the results, I want to map it to an existing Java Object, that I created to hold this data. This is kind of different to the 'normal' mapping, as in general, you want to map your graph nodes to objects that represent those nodes. Here, my POJOs have nothing to do with the graph nodes.
I managed to do this using custom CompositeAttributeConverter classes for each of my data-objects, but I feel there must be a better solution than writing a new class for every new object.

You might want to take a look at executing arbitrary Cypher queries using the Session object. You can get an Iterable<Map<String,Object>> from the returned Result object, which you could process over or just output to a collection of Map results.
Or, if you have APOC Procedures installed, you can always write up a query to return your results as a JSON string, and convert that to JSON objects in Java with the appropriate library and use those as needed.

Java objects to Hbase

I'm currently using KITE API + AVRO to handle java objects to HBase. But due to various problems I'm looking for an alternative.
I've been reading about:
Phoenix
Native Hbase Api.
But there is more an alternative? .
The idea is to save and to load the java objects to Hbase and uses them in a java application.

If you're storing your objects in the Value portion of the KeyValue pair, then it's really just an array / sequence of bytes (i.e. in the code for KeyValue class there is a getValue method which returns a byte array).
At this point, you're down to object serialization and there are a host of libraries you can use with various ease of use, performance characteristics, and details of implementation. Avro is one type of serialization library which stores the schema with each record, but you could in theory use:
Standard Java serialization (implement Serializable)
Kryo
Protobuf
Just to name a few. You may want to investigate the various strengths of each library & its tradeoffs and balance that against the type of objects you plan to store (i.e. are they all effectively the same type of object or do they vary widely in type? Are they going to be long lived i.e. years and have the expectation of schema evolution & backwards compatibility etc.)

Phoenix is a JDBC api to HBase. It handles most SQL types (except intervals) - you can store arbitrary java objects using the binary data type. But if you are only storing binary data, you could easily stick with HBase. If you can coerce your data in standard SQL types, Phoenix may be a good option.

If you want to stick with the Hadoop/HBase code you can have your complex class implement org.apache.hadoop.io.Writable.
// Some complex java object
// that implements org.apache.hadoop.io.Writable
SomeObject myObject = new SomeObject();
// write the object to a byte array
// for storage in HBase
byte[] byteArr = WritableUtils.toByteArray(myObject);
Reference

How do I retrieve blob (containing serialized java object) from DB and convert them to xml?

I have some Java objects stored in an Oracle database. I wish to know the structure and content of the objects. So, I want to retrieve the blob and convert them to xml/any other displayable form.
Is this possible? If yes, how?

I would extract the binary object from the database, create back the java object in memory (so you can also ensure the data is valid) and after that I would use a library like Protostuff to quickly serialize the object in XML.
The advantage of using Protostuff is that you don't need anything but the java object. The object "schema" is calculated at runtime if needed.
Consider also that Protostuff supports a lot of different formats, like JSON, Protobuffer, YAML, etc...

Purpose of the Lazy* classes in the MongoDB Java API

The MongoDB Java driver documentation of the packet org.bson mentions various "Lazy" versions of other classes. Unfortunately the Javadocs of these classes can barely be called documentation.
What is their purpose and how does their behavior differ from the normal versions?

Under normal operation, the driver creates and consumes documents using the DBObject Map-like interface. When inserting documents, it iterates over the map to convert it the corresponding BSON representation. When querying, it creates new documents by putting key-value pairs into the map.
But there are times when you want to work with raw BSON and not pay the cost of all this serialization and deserialization. That's what the lazy DBObject implementations are for. Instead of treating them as a map, a custom encoder instead writes the bytes directly to the BSON stream. Similarly, a custom decoder writes the raw bytes directly into the lazy DBObject.
In this context, the meaning of the term lazy is that, since the lazy equivalents still have to implement the DBObject interface, they do so by "lazily" interpreting the raw BSON byte array that they contain.
One last note: the lazy DBObject classes are very likely not going to be included in the upcoming 3.0 release of the driver, as the entire serialization is changing in a way that is not compatibile with lazy DBObjects. There will be equivalent functionality, but not API compatible.

Java Programming - Spring and JDBCTemplate - Use query, queryForList or queryForRowSet?

My Java (JDK6) project uses Spring and JDBCTemplate for all its database access. We recently upgraded from Spring 2.5 to Spring 3 (RC1). The project does not use an ORM like Hibernate nor EJB.
If I need to read a bunch of records, and do some internal processing with them, it seems like there are several (overloaded) methods: query, queryForList and queryForRowSet
What should be the criteria to use one instead of the other? Are there any performance differences? Best practices?
Can you recommend some external references for further research on this topic?

I find that the standard way to access as list is via the query() methods rather than any of the other approaches. The main difference between query and the other methods is that you'll have to implement one of the callback interfaces (either RowMapper, RowCallbackHandler, or ResultSetExtractor) to handle your result set.
A RowMapper is likely what you'll find yourself using most of the time. It's used when each row of the result set corresponds to one object in your list. You only have to implement a single method mapRow where you populate the type of object that goes in your row and return it. Spring also has a BeanPropertyRowMapper which can populate the objects in a list via matching the bean property names to the column names (NB this class is for convenience not performance).
A RowCallbackHandler is more useful when you need your results to be more than just a simple list. You'll have to manage the return object yourself you are using this approach. I usually find myself using this when I need a map structure as my return type (i.e. for grouped data for a tree table or if I'm creating a custom cache based of the primary key).
A ResultSetExtractor is used when you want to control the iteration of the results. You implment a single method extractData that will be the return value of the call to query. I only find myself using this if I have to build some custom data structure that is more complex to build using either of the other callback interfaces.
The queryForList() methods are valuable in that you don't have to implement these callback methods. There are two ways use queryForList. The first is if you're only querying a single column from the database (for example a list of strings) you can use the versions of the method that takes a Class as an argument to automatically give you a list of only objects of those classes.
When calling the other implementations of queryForList() you'll get a list back with each entry being a map of for each column. While this is nice in that you are saved the expense of writing the callback methods, dealing with this data structure is quite unwieldy. You'll find yourself doing a lot of casting since the map's values are of type Object.
I've actually never seen the queryForRowSet methods used in the wild. This will load the entire result of the query into a CachedRowSet object wapped by a Spring SqlRowSet. I see a big downside in using this object in that if you're passing the SqlRowSet around to the other layers of your application, you're coupling those layers to your data access implementation.
You shouldn't see any huge performance differences between any of these calls except as I mentioned with the BeanPropertyRowMapper. If you're working with some complex manipulation of a large result set, you might be able to get some performance gains from writing an optimized ResultSetExtractor for your specific case.
If you want to learn more I would consult the Spring JDBC documentation and the JavaDoc for the classes I've mentioned. You can also take a look at some of the books on the Spring Framework. Though it's a bit dated Java Development with the Spring Framework has a very good section on working with the JDBC framework. Most of all, I would say just try writing some code with each method and see what works best for you.

Since you are in the wonderful Generics land, what you may really want to do is to use SimpleJdbcTemplate and use its query() methods for Lists of objects and queryForObject() for individual objects. Reasoning for this simply is that they're even easier to use than the ones in JdbcTemplate.

One small addition to the excellent answers above: additional methods, like queryForInt, queryForLong, queryForMap, queryForObject, etc. might seem like good options at times if you're running a simple query and expect a single row.
However, if you could get 0 or 1 rows back, the queryForList method is generally easier, otherwise you'd have to catch IncorrectResultSizeDataAccessException. I learned that the hard way.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to read schema-less documents using Java from MongoDB - java

Related

Mapping neo4j ogm query results to java objects

Java objects to Hbase

How do I retrieve blob (containing serialized java object) from DB and convert them to xml?

Purpose of the Lazy* classes in the MongoDB Java API

Java Programming - Spring and JDBCTemplate - Use query, queryForList or queryForRowSet?

Categories

Resources