Apach solr. query syntax explanation - java

I messed with syntax q query:
if I write q=*:* - I see 2 results.
If I skip q - I have not see anything
if I write q=price:* - see 2 results
if I write q=price - 0 results
update
q=price:0 - 1 result
Can you explain differences between these queries?
especially I want to understand what does it mean 4 th variant ?
indexed documents:
add><doc>
<field name="id">3007WFP</field>
<field name="name">Dell Widescreen UltraSharp 3007WFP</field>
<field name="manu">Dell, Inc.</field>
<!-- Join -->
<field name="manu_id_s">dell</field>
<field name="cat">electronics</field>
<field name="cat">monitor</field>
<field name="features">30" TFT active matrix LCD, 2560 x 1600, .25mm dot pitch, 700:1 contrast</field>
<field name="includes">USB cable</field>
<field name="weight">401.6</field>
<field name="price">2199</field>
<field name="popularity">6</field>
<field name="inStock">true</field>
<!-- Buffalo store -->
<field name="store">43.17614,-90.57341</field>
<field name="cat">XXX</field>
</doc></add>
<add>
<doc>
<field name="id">SOLR1000</field>
<field name="name">Solr, the Enterprise Search Server</field>
<field name="manu">Apache Software Foundation</field>
<field name="cat">software</field>
<field name="cat">search</field>
<field name="cat">XXX</field>
<field name="features">Advanced Full-Text Search Capabilities using Lucene</field>
<field name="features">Optimized for High Volume Web Traffic</field>
<field name="features">Standards Based Open Interfaces - XML and HTTP</field>
<field name="features">Comprehensive HTML Administration Interfaces</field>
<field name="features">Scalability - Efficient Replication to other Solr Search Servers</field>
<field name="features">Flexible and Adaptable with XML configuration and Schema</field>
<field name="features">Good unicode support: héllo (hello with an accent over the e)</field>
<field name="price">0</field>
<field name="popularity">10</field>
<field name="inStock">true</field>
<field name="incubationdate_dt">2006-01-17T00:00:00.000Z</field>
</doc>
</add>

If you do not give the value it consider the default value. As in your fourth query
q=price means it searches the default searchable field having value "price"
That's why you are getting 0 result since no price is of 0 value.

Related

SAXParseException error in XML Parsing

Using web service call ,I got the following response from server. Now i need to parse this response, Extract all field values and store it in String values
<?xml version="1.0" encoding="utf-8"?>
<ecomexpress-objects version="1.0"><object pk="1" model="awb">
<field type="BigIntegerField" name="awb_number">102019265</field>
<field type="CharField" name="orderid">8008444</field>
<field type="FloatField" name="actual_weight">2</field>
<field type="CharField" name="origin">DELHI-DSW</field>
<field type="CharField" name="destination">Mumbai - BOW</field>
<field type="CharField" name="current_location_name">Mumbai - BOW</field>
<field type="CharField" name="current_location_code">BOW</field>
<field type="CharField" name="customer">Ecom Express Private Limited - 32012</field>
<field type="CharField" name="consignee">BEECHAND VERMA</field>
<field type="CharField" name="pickupdate">22-Jan-2014</field>
<field type="CharField" name="status">Undelivered</field>
<field type="CharField" name="tracking_status">Undelivered</field>
<field type="CharField" name="reason_code">221 - Consignee Refused To Accept</field>
<field type="CharField" name="reason_code_description">Consignee Refused To Accept</field>
<field type="CharField" name="reason_code_number">221 </field>
<field type="CharField" name="receiver"></field>
<field type="CharField" name="expected_date" >15-Feb-2014</field>
<field type="CharField" name="last_update_date" ></field>
<field type="CharField" name="delivery_date" ></field>
<field type="CharField" name="ref_awb" >703063993</field>
<field type="CharField" name="rts_shipment" >0</field>
<field type="CharField" name="system_delivery_update" ></field>
<field type="CharField" name="rts_system_delivery_status" Undelivered</field>
<field type="CharField" name="rts_reason_code_number">777</field>
<field type="CharField" name="rts_last_update">22 Jan, 2014, 12:44 </field>
<field type="CharField" name="pincode" >400037</field>
<field type="CharField" name="city" >MUMBAI</field>
<field type="CharField" name="state" >Maharashtra</field>
<field name="scans">
</ecomexpress-objects>
If i try the following code to parse
String xml=result.toString();
try{
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is;
is = new InputSource(new StringReader(xml));
Document doc = db.parse(is);
NodeList nodelist = doc.getChildNodes();
}
catch (SAXException e) {
e.printStackTrace();
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
I get the following error.
org.xml.sax.SAXParseException: expected: /field read: ecomexpress-objects (position:END_TAG </ecomexpress-objects>#1:1917 in java.io.StringReader#39978dff)
I need to store all field values in respective string variables
There are several errors in your xml :
No closing tag object .
Undelivered appears in attribute at : <field type="CharField"
name="rts_system_delivery_status" Undelivered</field>
should be closed like <field name="scans"/>
As long as xml is invalid, you keep getting exceptions.
Your XML is not valid.
The tag <field name="scans"> is not closed.

Solr return only parent document on child query match

I have a set of documents indexed which has a pseudo parent-child relationship. Each child document had a reference to the parent document. Due to some availability complexity, these documents are not being indexed to support block-join, i.e. instead of a nested structure, they are all flat. Here's an example:
<doc>
<field name="id">1</field>
<field name="title">Parent title</field>
<field name="doc_id">123</field>
</doc>
<doc>
<field name="id">2</field>
<field name="title">Child title1</field>
<field name="parent_doc_id">123</field>
</doc>
<doc>
<field name="id">3</field>
<field name="title">Child title2</field>
<field name="parent_doc_id">123</field>
</doc>
<doc>
<field name="id">4</field>
<field name="title">Misc title2</field>
</doc>
What I'm looking is if I search "title2", the result should bring back the following two docs, 1 matching the parent and one based on a regular match.
<doc>
<field name="id">1</field>
<field name="title">Parent title</field>
<field name="doc_id">123</field>
</doc>
<doc>
<field name="id">4</field>
<field name="title">Misc title2</field>
</doc>
With block-join support, I could have used Block Join Parent Query Parser,
q={!parent which="content_type:parentDocument"}title:title2
Transforming result documents is an alternate but it has the reverse support through ChildDocTransformerFactory.
Just wondering if there's a way to address query in a different way. Any pointers will be appreciated.
If you use Solr 6, you might be able to expand the results to include the parent records using Graph query parser.

How can I create a blob index field correctly with Solr 5?

I think the title of my question explains much of what I need. I am using the Data Importer Handler of Apache SOLR 5. I configured my solrconfig.xml, schema.xml and data-config.xml. It's working for now.
However, I need to add one more field. An Oracle Blob field. First, let me show my configurations:
data-config.xml
<dataConfig>
<!-- Datasource -->
<dataSource name="myDS"
setReadOnly="true"
driver="oracle.jdbc.OracleDriver"
url="jdbc:oracle:thin:#//server.example.com:1521/service_name"
user="user"
password="pass"/>
<document name="products">
<entity name="product"
dataSource="myDS"
query="select * from products"
pk="id"
processor="SqlEntityProcessor">
<field column="id" name="id" />
<field column="name" name="name" />
<field column="price" name="price" />
<field column="store" name="store" />
<!-- I've added this blob field -->
<field column="picture" name="picture" />
</entity>
</document>
</dataConfig>
solrconfig.xml
<requestHandler name="/products" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
<!-- JDBCs -->
<lib dir="../../../lib" />
My fields in schema.xml
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="_text" type="string" indexed="true" stored="false" multiValued="true"/>
<field name="name" type="string" indexed="true" stored="true"/>
<field name="price" type="float" indexed="true" stored="true"/>
<!-- BLOB field -->
<field name="picture" type="binary" indexed="true" stored="true"/>
<copyField source="*" dest="_text"/>
<!-- ommited solr default fields -->
Now, when I start a full-importer, SOLR only indexes some records. This is the output after SOLR finish the importing:
Indexing completed. Added/Updated: 64 documents. Deleted 0 documents. (Duration: 04s)
Requests: 1 (0/s), Fetched: 1369 (342/s), Skipped: 0, Processed: 64 (16/s)
Started: less than a minute ago
As you can see, I have 1369 records, but SOLR only index 64 documents. If I remove the field picture from schema or, set index and stored attributes to false, SOLR import all documents.
I opened the SOLR log, and found this error when importing the blob field:
3436212 [Thread-19] WARN org.apache.solr.handler.dataimport.SolrWriter – Error creating document : SolrInputDocument(fields: [name=PRODUCTNAME, price=PRICE, store=STORE, picture=oracle.sql.BLOB#4130607a, _version_=1497915495941144576])
org.apache.solr.common.SolrException: ERROR: [doc=<ID>] Error adding field 'picture'='oracle.sql.BLOB#4130607a' msg=Illegal character .
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:176)
at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:78)
at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1085)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:697)
at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:71)
at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:263)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:511)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.lang.IllegalArgumentException: Illegal character .
at org.apache.solr.common.util.Base64.base64toInt(Base64.java:150)
at org.apache.solr.common.util.Base64.base64ToByteArray(Base64.java:117)
at org.apache.solr.schema.BinaryField.createField(BinaryField.java:89)
at org.apache.solr.schema.FieldType.createFields(FieldType.java:305)
at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:48)
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:123)
... 18 more
I checked querying directly against database, and it's working fine. I am using SOLR 5, ojdbc7 and Java 8. How can I use the binary field correctly in SOLR?
Update
I've changed the properties of picture in schema.xml setting indexed=false. This way:
<!-- BLOB field -->
<field name="picture" type="binary" indexed="false" stored="true"/>
Then, I restarted SOLR, reloaded my core, and did a Full-Import again. No success and same exception. The same 64 documents that I described above was imported and the field picture does not appear in JSON response. The query I execute is:
/select?q=*%3A*&wt=json&indent=true

Solr not recognizing camelcased field names over update/extract?

I've been working with SolrJ for months now without any problem with a schema that follows the following pattern, with underscores and camelcasing:
<field name="museum_eventActor" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventType" type="text" indexed="false" stored="true" multiValued="true"/>
<field name="museum_eventPlace" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventDate" type="text" indexed="true" stored="true" multiValued="true"/>
We recently decided that we wanted to index some PDF content, so I started using curl to test some content:
curl "http://localhost:8090/solr-museum/archival/update/extract?stream.file=/home/user/Downloads/transcript.pdf&stream.contentType=application/pdf&literal.id=C1-1-5&literal.museum_eventActor=test&fmap.content=text&commit=true"
But I was noticing that although Solr was acknowledging my fields, none of them were showing up in my index. Solr log says:
792560 [http-8090-1] INFO org.apache.solr.update.processor.LogUpdateProcessor – [archivalRecord] webapp=/solr-museum path=/update/extract params={fmap.content=text&commit=true&literal.museum_eventActor=&literal.id=C1-1-5&stream.contentType=application/pdf&stream.file=/home/user/Downloads/transcript.pdf} {add=[C1-1-5 (1467000805262360576)],commit=} 0 698
and the index looks like:
<doc>
<str name="id">C1-1-5</str>
<long name="_version_">1467000805262360576</long>
<arr name="content">
<str>1467000805262360576</str>
</arr>
</doc>
After a day of playing around and searching online, I found this SO question which made me wonder about camelcasing: Solr - Missing Required Field
So I modified my schema to look something like this:
<field name="museum_eventactor" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museumeventtype" type="text" indexed="false" stored="true" multiValued="true"/>
<field name="museumeventPlace" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventDate" type="text" indexed="true" stored="true" multiValued="true"/>
And fired over this request:
curl "http://localhost:8090/solr-museum/archival/update/extract?stream.file=/home/user/Downloads/transcript.pdf&stream.contentType=application/pdf&literal.id=C1-1-5&literal.museum_eventactor=test&literal.museumeventtype=test&literal.museumeventPlace=test&literal.museum_eventDate=test&fmap.content=text&commit=true"
And sure enough, the fields with camelcasing aren't being recognized:
<doc>
<arr name="museum_eventactor">
<str>test</str>
</arr>
<str name="id">C1-1-5</str>
<arr name="museumeventtype">
<str>test</str>
</arr>
<long name="_version_">1467001178833289216</long>
</doc>
Now I've searched through a lot of the Solr documentation, and although they point out repeatedly that there are very few restrictions on field names if you're willing to accept the consequences, never have I ever encountered a scenario where camelcasing isn't a valid naming scheme especially in Java. I'm kind of stumped as to why this might be happening. Does anyone have any ideas that might explain this behavior?

Trying to serialize an object compactly using Castor

I'm using Castor to write out a map of user ID's to time intervals. I'm using it to save and resume progress in a lengthy task, and I'm trying to make the XML as compact as possible. My map is from string userID's to a class that contains the interval timestamps, along with additional transient data that I don't need to serialize.
I'm able to use a nested class mapping:
...
<field name="userIntervals" collection="map">
<bind-xml name="u">
<class name="org.exolab.castor.mapping.MapItem">
<field name="key" type="string"><bind-xml name="n" node="attribute"/></field>
<field name="value" type="my.package.TimeInterval"/>
</class>
</bind-xml>
</field>
...
<class name="my.package.TimeInterval">
<map-to xml="ti"/>
<field name="intervalStart" type="long"><bind-xml name="s" node="attribute"/></field>
<field name="intervalEnd" type="long"><bind-xml name="e" node="attribute"/></field>
</class>
...
And get output that looks like:
<u n="36164639"><value s="1292750896000" e="1292750896000"/></u>
What I'd like is the name, start, and end of the user in a single node like this.
<u n="36164639" s="1292750896000" e="1292750896000"/>
But I can't seem to finagle it so the start and end attributes in the "value" go in the same node as the "key". Any ideas would be greatly appreciated.
Nash,
I think to arrange the castor mapping is bit tricky.
If you want to have structure like
<u n="36164639" s="1292750896000" e="1292750896000"/>
Then you need to create a new pojo file where it will be having
all the three fields Key,intervalStart,intervalEnd.
And let the File name as KeyTimeInterval
And map it like the below.
<field name="userIntervals" collection="map">
<class name="org.exolab.castor.mapping.MapItem">
<field name="u" type="my.package.KeyTimeInterval">
<bind-xml name="u" node="element"/>
</field>
</class>
</field>
<class name="my.package.KeyTimeInterval">
<field name="key" type="String">
<bind-xml name="n" node="attribute"/></field>
<field name="intervalStart" type="long">
<bind-xml name="s" node="attribute"/></field>
<field name="intervalEnd" type="long">
<bind-xml name="e" node="attribute"/></field>
</class>
I think you should be able to use location on s and e. Try this:-
...
<class name="my.package.TimeInterval">
<map-to xml="ti"/>
<field name="intervalStart" type="long">
<bind-xml name="s" location="u" node="attribute"/>
</field>
<field name="intervalEnd" type="long">
<bind-xml name="e" location="u" node="attribute"/>
</field>
</class>
Am answering my own question here, since there is a solution that does exactly what I want, and there's actually an error in the explanation at http://www.castor.org/xml-mapping.html#Sample-3:-Using-the-container-attribute - the container attribute is exactly what's needed here.
Changing one line in the mapping:
<field name="value" type="my.package.TimeInterval" container="true"/>
did exactly what I wanted, it didn't create a subelement for the value, just mapped the fields into the existing parent element. Since then, I've used this quite a few times to map multiple-value classes into their parent.
The error of course is the documentation states you do this by setting the container attribute to false. Of course, it should be true.

Categories