Solr: Using the Block Join Children Query Parser - java

Currently I evaluate the Block Join Children Query Parser as described here.
Therefore I have created the following collection:
curl "http://localhost:8983/solr/admin/collections?action=CREATE&name=nestedPerson&numShards=6"`
Then I have inserted these two documents:
curl http://localhost:8983/solr/nestedPerson/update?commitWithin=3000 -d '<add>
<doc>
<field name="id">p1</field>
<field name="deceased">false</field>
<doc>
<field name="id">c1</field>
<field name="firstName">Bob</field>
</doc>
</doc>
<doc>
<field name="id">p2</field>
<field name="deceased">true</field>
<doc>
<field name="id">c2</field>
<field name="firstName">Max</field>
</doc>
</doc>
</add>'
Now I issue this query:
{!child of="id:p1"}firstName:Bob
Unfortunately this results in this error:
"msg": "Parent query yields document which is not matched by parents filter, docID=0",
How can the parent query (I guess that the part id:p1 is meant) yield a document that is not matched by the filter?

Take a look at the Solr Wiki that you refer to again here. Note the following:
The syntax for this parser is: q={!child of=<allParents>}<someParents>. The parameter allParents is a filter that matches only parent documents
In your example, the query is {!child of="id:p1"}firstName:Bob. The field id as used in<allParents>, but id is contained in both parent and child documents.
You need to introduce a field that only parent documents have, such as <field name="content_type">parentDocument</field> from the wiki. Once all parent documents (and only parent documents) have this field, you could submit the query as:
q={!parent which="content_type:parentDocument"}firstName:Bob
This would match child documents for firstName:Bob and return their parents. In a similar fashion, use q={!child of=<allParents>}<someParents> to match parent documents and return their children.

Related

Mapping XML with JAXB, when the fields are generic, and the actual field names are mapped elsewhere?

In my java(/spring/hibernate) web app, I am contending with XML like this (I've simplified it down a lot for example purposes - I cannot modify the XML as I'm receiving it from a third party - I only control the client code, and client domain objects - there is no XSD or WSDL to represent this XML either):
<?xml version="1.0" encoding="utf-16"?>
<Records count="22321">
<Metadata>
<FieldDefinitions>
<FieldDefinition id="4444" name="name" />
<FieldDefinition id="5555" name="hair_color" />
<FieldDefinition id="6666" name="shoe_size" />
<FieldDefinition id="7777" name="last_comment"/>
<!-- around 100 more of these -->
</FieldDefinitions>
</Metadata>
<!-- Several complex object we don't care about here -->
<Record contentId="88484848475" >
<Field id="4444" type="9">
<Reference id="56765">Joe Bloggs</Reference>
</Field>
<Field id="5555" type="4">
<ListValues>
<ListValue id="290711" displayName="Red">Red</ListValue>
</ListValues>
</Field>
<Field id="6666" type="4">
<ListValues>
<ListValue id="24325" displayName="10">10</ListValue>
</ListValues>
</Field>
<Field id="7777" type="1">
<P>long form text here with escaped XML here too
don't need to process or derefernce the xml here,
just need to get it as string in my pojo<P>
</Field>
</Record>
<Record><!-- another record obj here with same fields --> </Record>
<Record><!-- another record obj here with same fields--> </Record>
<!-- thousands more records in the sameish format -->
</Records>
The XML contains a 'records' element, which contains some metadata, then lots of 'record' elements. Each record element contains lots of 'field' entries.
My goal would be to use JAXB to unmarshall this XML into a large collection of 'record' objects. So I could do something like this:
List<Record> unmarhsalledRecords = this.getRecordsFromXML(stringOfXmlShownAbove)
where each record would look like this:
public class Record {
private String name;
private String hairColor;
private String shoeSize;
private String lastComment;
//lots more fields
//getters and setters for these fields
}
However, I've never needed to dereference field names in jaxb - is that even possible with jaxb - or do I need to write some messy/hard to maintain code with a stax parser?
None of the examples I can find online touch on anything like this - any help would be greatly appreciated.
Thank you!
I don't think jaxb supports complex mapping logic like. A couple of options that I can think of.
Transform the xml using freemarker or xslt (I hate xslt) to an xml format that matches your desired model before parsing with jaxb
Eg
<Records>
<Record>
<Name>Joe Bloggs</Name>
<HairColour>Red</HairColour>
...
</Record>
</Records>
Parse the xml as is and write an adapter wrapper in the java layer which adapts from the inbound jaxb objects to your more "user friendly" model. The adapter layer could call into the jaxb objects under the hood so you could later serialize back to xml after changes

How to properly outline an Apache Solr document?

What is the difference between delta import and full import in ApacheSolr?
What are all the naming conventions to be followed while writing deltaImportQuery and deltaQuery ( ID, TXT_ID etc), any references or tutorial explaining in detail about differences/relations between deltaImportQuery and deltaQuery, Like what it is and what it does etc, How to write deltaImportQuery and deltaQuery ?
How to configure multiple entities in one document, Suppose if there are three tables in database like T1, T2, T3, Then in schema.xml how to configure this, issue with only one <uniquekey>somename</uniquekey> been considered for each schema.xml file?
How to parse BLOB type input from mysql, following convert ( column_name using utf8 ) as alias_name solves this but what is the right convention ,some other methods are also available like using TikaEntityProcessor/ writing custom BLOBTRANSFORMERS etc ?
Just like ORM any concepts explaining how to denormalize and outline an Apache Solr document , Any showcase project including all use cases and features ?
How to configure entities like this in data-config.xml?
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/solrdemo" user="root" password="123" batchSize='1' />
<document name="user">
<entity name ="user1" query = "SELECT * FROM user1">
<field column='id' name='id1' />
</entity>
<entity name ="user2" query = "SELECT * FROM user2">
<field column='id' name='id2' />
</entity>
<entity name ="user3" query = "SELECT * FROM user3">
<field column='id' name='id3' />
</entity>
</document>
</dataConfig>
When the above kind of configuration is done then in schema.xml which id should be configured into <uniquekey></uniquekey> ?
The result of above configuration is
Indexing completed. Added/Updated: 2,866 documents. Deleted 0 documents. (Duration: 03s)
Indexing is successfully completed but 0 documents added / updated , How to resolve this issue ?
Overall any references available for proper conventions and configurations to work with Apache Solr?

Does not working less than operator in solr query

I'm using solr4.7 with CoreContainer to create core and EmbeddedSolrServer to connect and ModifiableSolrParams to fetch the data..
I have configured "solrconfig.xml" with requestHelper "import" to import the data and other configuration file for data config as under....
<dataConfig>
<dataSource driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/koupon"
user="root"
password="root" />
<document>
<entity name="koupon"
query="SELECT k.kouponid as kid, k.name, k.image,k.description,k.discount as discount,
k.startdate,k.enddate,k.actualamount,
k.discountamount,k.discountamount, c.name as category
FROM koupon.koupon k
INNER JOIN koupon.category c on c.categoryid = k.categoryid
where
k.startDate <= NOW() and endDate >= NOW()
AND c.isActive=true AND c.isDeleted=false AND k.isActive =true AND lower(k.status)=lower('approved')
order by k.kouponid">
<field column="kid" name="kid"/>
<field column="name" name="name"/>
<field column="image" name="image"/>
<field column="description" name="description"/>
<field column="startdate" name="startdate"/>
<field column="enddate" name="enddate"/>
<field column="actualamount" name="actualamount"/>
<field column="discountamount" name="discountamount"/>
<field column="discount" name="discount"/>
<field column="category" name="category"/>
</entity>
</document>
</dataConfig>
In this code use "k.startDate <= NOW() and endDate >= NOW()" for fetching in between record but solr query not providing this.
I have one solution that's start "To" End but that's not exact solution..
I am very tired for this Issue any know about this? How to solve this ?
You can try to use this mysql query for indexing:
WHERE (NOW() BETWEEN k.startDate and k.endDate)
instead of:
k.startDate <= NOW() and endDate >= NOW()
to prevent the error
The value of attribute "query" associated with an element type
"entity" must not contain the '<' character
For your Solr query
startdate:[* to NOW]
to work, make sure your startdate field conforms to the Solr dateField type. For more info on this, check this link

Retrieving attribute values depending on the value of another attribute using xpath

I have the following xml doc:
<database>
<order>
<data>
<field name="time" value="10:10:10" />
</data>
<data>
<field name="product" value="product_type_1">
<field name="attributeA" value="Foo" />
<field name="attributeB" value="Bar" />
</field>
<field name="attributeC" value="Jeam" />
<field name="attributeD" value="Beam" />
<field name="attributeE" value="Deam" />
</data>
</order>
<order>
<data>
<field name="time" value="10:10:11" />
</data>
<data>
<field name="product" value="product_type_2">
<field name="attributeF" value="Bravo" />
<field name="attributeG" value="Echo" />
</field>
<field name="attributeC" value="Jeam2" />
<field name="attributeD" value="Beam2" />
<field name="attributeJ" value="Charlie" />
<field name="attributeK" value="Tango" />
<field name="attributeL" value="Zulu" />
</data>
</order>
It is a set of "order" elements but the "field" (both on quantity and type) depend on the value of the element whose name is "product". I am interested in extracting info depending on the value of the product. More specifically, I would end up with something like this table:
Time Product AttributeA AttributeB AttributeC AttributeD
10:10:10 product_type_1 Foo Bar Jeam Beam
10:10:11 product_type_2 Jeam2 Beam2
In other words I am trying to "cut" unesessary info depending on the value of child element of "order". I am trying to achive this by using xpath (in java) but I am stuck. It is impossible for me to emulate the "if" condition described above.
I am thinking of using and xpath query to retrieve one order element at a time, then query for the product type and then choose the apropriate xpath to retieve the coresponding attributes, but that sounds really inneficient and slow.
Is it possible to do it more efficiently? Is xpath not the right answer here?
Thanks in advance.
P.S: The alignment and organization of the data you see above doesn't really matter as long as I retrieve the correct data then I am sure I'll be able to print them somehow.
If you want to use XPath, you will need at least XPath 3.0 or XQuery (this code is valid in both of them). Have a look at XQuery engines if you want to use this in Java, for example Saxon, BaseX, eXist DB, ...
for $order in /database/order
return string-join((
$order//field[#name='time']/#value,
$order//field[#name='product']/#value,
($order//field[#name='attributeA']/#value, '')[1],
($order//field[#name='attributeB']/#value, '')[1],
($order//field[#name='attributeC']/#value, '')[1],
($order//field[#name='attributeD']/#value, '')[1]),
' ')
The pattern used for the attributes makes sure that empty values do not break the table layout (so for the second product type, attributes C and D do not get attributes A and B). is the tab character.
If you want to use Java for further processing the output, I'd go with this: Fetch all orders (/database/order) and loop over them. Then, for each order, use DOM (or XPath again) to fetch the nodes you need. Yet it seems that the question you asked is not your actual problem, it might be that using XQuery could lead to a cleaner solution.

Solr search is returning partial string matches

Using Solr 3.6.1, I have this field in my schema.xml:
<field name="names" type="text_general" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="names_*" type="text_general" indexed="true" stored="true"/>
The documentation in the schema.xml states that "text_general" should:
tokenize with StandardTokenizer
removes stop words from case-insensitive "stopwords.txt" (which is currently empty)
down cases the string.
At query time only, it also applies synonyms (which is also empty at this time)
I have two documents indexed in Solr with this data for the field:
<!-- doc 1 -->
<str name="names_data">Name ABC Dev Loc</str>
<!-- doc 2 -->
<str name="names_data">Name ABC Dev Location</str>
When I execute the following query:
id:(doc1 OR doc2) AND names:Dev+Location)
Both documents are returned. I would have expected that only doc2 would have been returned based on my understanding of how Solr's StandardTokenizer works.
Why does "Dev+Location" match "Dev Loc" and "Dev Location"?
The type text_general is probably configured to use a stemmer, which is treating Loc as a variant of Location.
You could configure the type to not use a stemmer, or try searching for the whole string using names:"Dev Location"
This might be why.
This part of the query names:Dev+Location is only searching where names:Dev since the Location term does not have a field name qualifier it is searching for Location against whatever the <defaultSearchField> is set to in schema.xml
So you could try to quote the field like names:"Dev Location" or prefix it names:Dev AND names:Location

Categories