Solr, how to define Nested Documents in the schema.xml - java

I have a document with a nested document and I want to define the schema to Solr. I have been reading the documentation but I don't know how to define the schema.xml with nested documents.
When I try to index a document with addBean I get an error because I don't have in the schema the field obj1 and I don't know how to define it.
I'm using java object with #Field annotations.
public class ObjToIndex {
#Field
String id;
#Field
String name;
#Field
ObjToIndex2 obj1;
public class ObjToIndex2 {
#Field
String id;
#Field
String lastName;
I don't know how to define in the schema a field obj1 with type "object" or something similar.

I don't know how to define in the schema a field obj1 with type
"object" or something similar.
You can't (at least not in the way you think it)
Solr is not designed in that way: the unit of information is a document that is composed by fields; fields may be of different types, but, in short, they are only primitive types (strings, numbers, booleans), fields cannot be complex objects. Take a look at How Solr Sees the World in the documentation.
Does it mean you can't manage nested documents? No. You can manage them with some caveats
How to define the schema
First of all you need to define the internal _root_ field like this:
<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />
Then you need to merge all "primitive" fields of your parent and children objects in a single list of fields. This has some counterparts that are also mentioned in the solr documentation:
you have to define an id field that must exist for both parent and children objects and you have to guarantee it is globally unique
only fields that exists in both parent and children objects can be declared as "required"
For example let's see a slightly more complex case where you can nest multiple comments to blog posts:
public class BlogPost {
#Field
String id;
#Field
String title;
#Field(child = true)
List<Comment> comments;
}
public class Comment {
#Field
String id;
#Field
String content;
}
Then you need a schema like this:
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="${solr.core.name}" version="1.5">
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="long" class="solr.LongPointField" positionIncrementGap="0"/>
<fields>
<field name="_version_" type="long" indexed="true" stored="true" />
<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />
<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true" />
<field name="title" type="string" indexed="true" stored="true" multiValued="false" required="false" />
<field name="content" type="string" indexed="true" stored="true" multiValued="false" required="false" />
</fields>
<uniqueKey>id</uniqueKey>
</schema>
How to index documents
Using solrj it is pretty straightforward: simply create your nested objects in Java and the library will take care of creating the correct request when adding them
final BlogPost myPost = new BlogPost();
myPost.id = "P1";
myPost.title = "My post";
final Comment comment1 = new Comment();
comment1.id = "P1.C1";
comment1.content = "My first comment";
final Comment comment2 = new Comment();
comment2.id = "P1.C2";
comment2.content = "My second comment";
myPost.comments = List.of(comment1, comment2);
...
solrClient.addBean("my_core", myPost);
How to retrieve documents
This is a little bit tricky: to rebuild the original object and its children you have to use the child doc transformer in your request (query.addField("[child]")):
final SolrQuery query = new SolrQuery("*:*");
query.addField("*");
query.addField("[child]");
try {
final QueryResponse response = solrClient.query("my_core", query);
final List<BlogPost> documents = response.getBeans(BlogPost.class);

in order to have nested object, please use the #Field(child = true)
public class SolrBeanWithNested{
#Field
private String id;
#Field(child = true)
private MyNestedOject nested;
}
Available since solr 5.1
See ticket : solr child

I believe this is correct:
How to write nested schema.xml in solr?
Some of the logic of "why" is described here but the basic concept is that "child" documents are actually more "related" or "linked" documents within the same schema. They may include different fields, but effectively, they're just adding to the superset of fields in the overall schema.

Related

Mapping XML with JAXB, when the fields are generic, and the actual field names are mapped elsewhere?

In my java(/spring/hibernate) web app, I am contending with XML like this (I've simplified it down a lot for example purposes - I cannot modify the XML as I'm receiving it from a third party - I only control the client code, and client domain objects - there is no XSD or WSDL to represent this XML either):
<?xml version="1.0" encoding="utf-16"?>
<Records count="22321">
<Metadata>
<FieldDefinitions>
<FieldDefinition id="4444" name="name" />
<FieldDefinition id="5555" name="hair_color" />
<FieldDefinition id="6666" name="shoe_size" />
<FieldDefinition id="7777" name="last_comment"/>
<!-- around 100 more of these -->
</FieldDefinitions>
</Metadata>
<!-- Several complex object we don't care about here -->
<Record contentId="88484848475" >
<Field id="4444" type="9">
<Reference id="56765">Joe Bloggs</Reference>
</Field>
<Field id="5555" type="4">
<ListValues>
<ListValue id="290711" displayName="Red">Red</ListValue>
</ListValues>
</Field>
<Field id="6666" type="4">
<ListValues>
<ListValue id="24325" displayName="10">10</ListValue>
</ListValues>
</Field>
<Field id="7777" type="1">
<P>long form text here with escaped XML here too
don't need to process or derefernce the xml here,
just need to get it as string in my pojo<P>
</Field>
</Record>
<Record><!-- another record obj here with same fields --> </Record>
<Record><!-- another record obj here with same fields--> </Record>
<!-- thousands more records in the sameish format -->
</Records>
The XML contains a 'records' element, which contains some metadata, then lots of 'record' elements. Each record element contains lots of 'field' entries.
My goal would be to use JAXB to unmarshall this XML into a large collection of 'record' objects. So I could do something like this:
List<Record> unmarhsalledRecords = this.getRecordsFromXML(stringOfXmlShownAbove)
where each record would look like this:
public class Record {
private String name;
private String hairColor;
private String shoeSize;
private String lastComment;
//lots more fields
//getters and setters for these fields
}
However, I've never needed to dereference field names in jaxb - is that even possible with jaxb - or do I need to write some messy/hard to maintain code with a stax parser?
None of the examples I can find online touch on anything like this - any help would be greatly appreciated.
Thank you!
I don't think jaxb supports complex mapping logic like. A couple of options that I can think of.
Transform the xml using freemarker or xslt (I hate xslt) to an xml format that matches your desired model before parsing with jaxb
Eg
<Records>
<Record>
<Name>Joe Bloggs</Name>
<HairColour>Red</HairColour>
...
</Record>
</Records>
Parse the xml as is and write an adapter wrapper in the java layer which adapts from the inbound jaxb objects to your more "user friendly" model. The adapter layer could call into the jaxb objects under the hood so you could later serialize back to xml after changes

How do I get an object in the "many" part from a One-To-Many unidirectional relationship?

I don't know how to exactly express what I'm looking for, so I'll explain what I have and then what I'm trying to do.
I have a Class model, which has 2 classes with a One-To-Many unidirectional relationship.
public class TaxType extends Entity implements java.io.Serializable {
//stuff
private Set<TaxTypeAttribute> listTaxTypeAttribute = new HashSet<>(0);
}
public class TaxTypeAttribute extends Entity implements java.io.Serializable {
private String attributeName;
//stuff, but no reference to TaxType
}
Entity Class is like a primary key standard, we call it "OID design pattern", but no clue if it's like that in english.
public class Entity {
private String oid;
//constructor, get and set
}
On the mapping, it goes like this:
<class name="entity.TaxType" table="taxttype" catalog="tax_type" optimistic-lock="version">
<id name="oid" type="string">
<column name="OIDtt" length="50" />
<generator class="uuid2" />
</id>
<set name="listAtributoTipoImpuesto">
<key column="OIDtt" not-null="true"/>
<one-to-many class="entidades.AtributoTipoImpuesto" />
</set>
</class>
<!-- two separated files, this is just for showing -->
<class name="entity.TaxTypeAttribute" table="taxtypeattribute" catalog="tax_type" optimistic-lock="version">
<id name="oid" type="string">
<column name="OIDtta" length="50" />
<generator class="uuid2" />
</id>
<property name="attributeName" type="string">
<column name="attributeName" length="50" not-null="true" />
</property>
</class>
In one step of the program, I have a TaxType and the attributeName from a TaxTypeAttribute, but I need to get the full TaxTypeAttribute. I'm making querys through Criteria API. I could do taxType.getListTaxTypeAttribute(); and do a loop until I find the object, but I would like to know if there's a way to do it using some Hibernate querys.
I've tried doing taxType.getOid(); and then using that and the attributeName but it throws an exception:
Exception in thread "main" org.hibernate.QueryException: could not resolve property: OIDtt of: entity.TaxTypeAttribute
Any clues? Thanks you, and excuse me for my bad english
EDIT: In order to follow design patterns, we use this method to do SELECT querys: Awful thing we use for querys.
The way I did it is this:
ArrayList<DTOCriteria> criteriaList = new ArrayList<>();
DTOCriteria c1 = new DTOCriteria();
c1.setAttribute("OIDtt");
c1.setOperation("=");
c1.setValue(taxType.getOID());
criteriaList.add(c1);
ArrayList<Object> found = search("TaxTypeAttribute");
I could add another DTOCriteria if I want ("attributeName";"=";attributeName, for example), but if the former doesn't works it's kind of useless. I also tried (just because it's free) using "TaxType" as attribute and the TaxType object as a value, but didn't work either.
PS: the code works. I use it for other querys and works, it just doesn't work for this one, or I don't know how to make it work. May be you can't do that kind of search, I don't know.
From an HQL/JPQL perspective, you could write your query as:
SELECT tta FROM TaxType tt JOIN tt.listTaxTypeAttribute tta
WHERE tt.oid = :oid
AND tta.attributeName = :attributeName
This query will return you TaxTypeAttribute instances that match the specified criteria. How you translate that into your query language is something which I cannot aid with.

Hibernate - Mapping subclass POJO to same table as its parent

In our code have something along the lines of these two Java classes:
public class MailMessage {
private String id;
private String message;
private String sentTime;
// getters and setters
}
public class MailMessageWithRecipients extends MailMessage {
private String recipients;
// getter and setter for recipients
}
The purpose of separating the recipients list is that, since it can be very large, we would like to avoid loading it when possible. We would like to be able to map both MailMessage and MailMessageWithRecipients to a single table; mail_messages. However, if I try to do this in our Hibernate mapping file -
<class name="MailMessage" table="mail_messages" dynamic-insert="true" dynamic-update="true">
<id name="id" column="message_id" length="32">
<generator class="uuid" />
</id>
<property name="message" column="message" />
<property name="sentTime" column="sentTime" />
</class>
<class name="MailMessageWithRecipients" table="mail_messages" dynamic-insert="true" dynamic-update="true">
<id name="id" column="message_id" length="32">
<generator class="uuid" />
</id>
<property name="message" column="message" />
<property name="sentTime" column="sentTime" />
<property name="recipients">
<column name="recipients" sql-type="longblob" />
</property>
</class>
And then write a query something like this:
Query query = session.createQuery("from MailMessage where id = :id");
query.setParameter("id", id);
MailMessage message = query.uniqueResult();
I get a "org.hibernate.NonUniqueResultException: query did not return a unique result" exception when the query is executed, even when there is only a single row in the mail_messages table. I suspect this is because it sees that row as being both a MailMessage and a MailMessageWithRecipients. Is there a way to map these two objects to the same table this way while maintaining the parent-child Java relationship? Related discussions I've seen recommend a component relationship in the mapping file, but it seems to me that that would involve removing the inheritance structure and turning MailMessageWithRecipients into a facade structure for MailMessage. If that's the only way this can be done, I'll do it, but I was wondering if there was a mapping that allows the inheritance structure to remain intact, as that would involve significantly less refactoring of our code.

Castor XML Mapping and java.util.Map

I've been using Castor these past couple of days to try to get a little serialization going between my Java program and XML in a readable way. Though it has a few faults, Castor's automatic xml generation via reflection is actually very functional. Unfortunately, one thing that seems to be fairly well left out of the examples is dealing with generics. It seems the reflection API does a wonderful job as it is, but as it is inadvertently grabbing a lot of redundant data just because methods start with get___(), I wanted to write my own mapping file to stave this off.
Firstly, it seems altogether fair that in the attributes to a "field" element, one should define "type". However, it does not specify what should be done if this type is abstract or simply an interface. What should I put as the type then?
Secondly, most "collection" type objects specified in Castor (List, Vector, Collection, Set, etc) only require 1 generic type, so specifying "type" as what's inside and "collection="true"" are enough. However, it does not specify what I should do in the case of a collection like a Map, where 2 types are necessary. How can I specify both the key type and value type?
Any help at all would be greatly appreciated!
For the second of my questions:
When specifying something with a Map or a Table, you need to redefine org.exolab.castor.mapping.MapItem within the bind-xml element within your field element. Example taken from here
<class name="some.example.Clazz">
<field name="a-map" get-method="getAMap" set-method="setAMap">
<bind-xml ...>
<class name="org.exolab.castor.mapping.MapItem">
<field name="key" type="java.lang.String">
<bind-xml name="id"/>
</field>
<field name="value" type="com.acme.Foo"/>
</class>
</bind-xml>
</field>
</class>
Also, omit the type attribute from the parent field element.
For my first question, the trick is to NOT specify the type in the field element and allow Castor to infer it by itself. If you have definitions for the classes that could appear there, then it will automatically use those. For example:
<class name="some.example.Clazz">
<!-- can contain condition1 or condition2 elements -->
<field name="condition" collection="arraylist" required="true">
<bind-xml name="condition" node="element" />
</field>
</class>
<class name="some.example.condition1">
<field name="oneField" >
<xml-bind name="fieldOne" />
</field>
</class>
<class name="some.example.condition2">
<field name="anotherField />
<xml-bind name="fieldTwo" />
</field>
</class>
The output of into XML by Castor would use condition1 and condition2 style XML into the "condition" field of Clazz while still referring to its proper instantiation type.

How do you pass parameters to Hibernate's subselect tag?

The example at the end of hibernate section 5.1.3 does not show an example on passing parameters.
There is no difference between a view
and a base table for a Hibernate
mapping. This is transparent at the
database level, although some DBMS do
not support views properly, especially
with updates. Sometimes you want to
use a view, but you cannot create one
in the database (i.e. with a legacy
schema). In this case, you can map an
immutable and read-only entity to a
given SQL subselect expression:
<class name="Summary">
<subselect>
select item.name, max(bid.amount), count(*)
from item
join bid on bid.item_id = item.id
group by item.name
</subselect>
<synchronize table="item"/>
<synchronize table="bid"/>
<id name="name"/>
...
</class>
Is it possible? And if so, how?
Thanks,
Franz
I don't think that it is possible, because the mapping file is like a static description.
Since Hibernate 3 you can use formulas to map this types of readonly calculated fields. Example:
#Formula("(SELECT b.BANK_NAME FROM " +
" BANK_INFORMATION b, BILLING_AGENT_BANK ba " +
" WHERE ba.CNPJ = COMPANY_CNPJ " +
" AND b.BANK_ID = ba.BANK_ID)")
public String getBankName() {
return bankName;
}
This example is with a Annotated property, but you can do the same in the mapping file.
In NHibernate:
<class name="Blog" mutable="false">
<subselect>
SELECT Blog.Id, Blog.Author, Blog.Title, Comment.Comment
FROM Blog INNER JOIN Comment ON Blog.Id = Comment.Blog_id
WHERE Comment.LanguageId = :blogcomment.languageId
</subselect>
<id name="Id">
<generator class="assigned" />
</id>
<property name="Author" />
<property name="Title" />
<property name="Comment" />

Categories