Why can't I query SolrJ for a URL? - java

I have a Solr schema that has a "url" field:
<fieldType name="url" class="solr.TextField"
positionIncrementGap="100">
</fieldType>
<fields>
<field name="id" type="string" stored="true" indexed="true"/>
<field name="url" type="url" stored="true" indexed="false"/>
<field name="chunkNum" type="long" stored="true" indexed="false"/>
<field name="origScore" type="float" stored="true" indexed="true"/>
<field name="concept" type="string" stored="true" indexed="true"/>
<field name="text" type="text" stored="true" indexed="true"
required="true"/>
<field name="title" type="text" stored="true" indexed="true"/>
<field name="origDoctype" type="string" stored="true" indexed="true"/>
<field name="keywords" type="string" stored="true" indexed="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>text</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>
I can add SolrInputDocuments with all the fields and query them back using the text field and/or with a filter query on "concept". But when I try to query a specific url, I don't get any results. My code looks like:
SolrQuery query = new SolrQuery();
query.setQuery("url:" + ClientUtils.escapeQueryChars(url));
//query.setQuery("*:*");
//query.addFilterQuery("url:" + ClientUtils.escapeQueryChars(url));
List<Chunk> retCode = null;
try
{
QueryResponse resp = solrServer.query(query);
SolrDocumentList docs = resp.getResults();
retCode = new ArrayList<Chunk>(docs.size());
for (SolrDocument doc : docs)
{
LOG.debug("got doc " + doc);
Chunk chunk = new Chunk(doc);
retCode.add(chunk);
}
}
catch (SolrServerException e)
{
LOG.error("caught a server exception", e);
}
return retCode;
I've tried with and without the ClientUtils.escapeQueryChars and I've tried using a query of "url:" or a filter query on url. I never get anything back. Any hints?

Whats the actual type of "url"? In your schema.xml you should have a set of "fieldType" elements which list the actual Solr backing classes and filters that make up a data type.
For your "fieldType" for the "url" you are interested in the "class" attribute. E.g. the most basic free-text type has a class="solr.TextField". You might be using a type that has some wacky filters on it and Lucene/Solr ends up indexing your data differently from what you would expect.
Download Luke and look at your index visually:
http://www.getopt.org/luke/
It will help you "look" at your data - like I said, maybe its stored differently than what you expect.

Dammit, another stupid one on my part: Thanks to Cody's suggestion of using Luke, I discovered this inconvenient part of the schema:
<field name="url" type="url" stored="true" indexed="false"/>
Changing that to indexed="true" fixed the problem.

Related

How to configure schema.xml for geoSpatial Search in solr

I want to filter data nearby location using Solr Spatial Search but can't find any data
I want to filter data nearby location using Solr Spatial Search but can't find any data.
I have done the following changes in schema.xml:
<fieldType name="location" class="solr.LatLonPointSpatialField" docValues="true" />
<field name="latlong" type="text_general" indexed="true" stored="false" multiValued="false"/>
<field name="location" type="location" indexed="true" stored="true" multiValued="false"/>
Query :
q=:&fq={!geofilt%20sfield=location}&pt=22.303894,70.802162&d=5
Response :
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":0,
"params":{
"q":"*:*",
"pt":"22.303894,70.802162",
"d":"5",
"fq":"{!geofilt sfield=location}"}},
"response":{"numFound":0,"start":0,"numFoundExact":true,"docs":[]
}}
Expected Document:
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":0,
"params":{
"q":"*:*",
"indent":"true",
"q.op":"OR",
"_":"1671173790419"}},
"response":{"numFound":1,"start":0,"numFoundExact":true,"docs":[
{
"brandnd_name":["Luxury"],
"country":["India"],
"latitude":[22.2913494],
"longitude":[70.8023621],
"updated_date":["54:03.6"],
"added_date":["0018-05-14T11:19:00Z"],
"active_ind":[true],
"synxis_hotel_id":[37273],
"id":"4598a178-d984-41f1-b009-bbda37469b7c",
"_version_":1752351636887437312},

How can I create a blob index field correctly with Solr 5?

I think the title of my question explains much of what I need. I am using the Data Importer Handler of Apache SOLR 5. I configured my solrconfig.xml, schema.xml and data-config.xml. It's working for now.
However, I need to add one more field. An Oracle Blob field. First, let me show my configurations:
data-config.xml
<dataConfig>
<!-- Datasource -->
<dataSource name="myDS"
setReadOnly="true"
driver="oracle.jdbc.OracleDriver"
url="jdbc:oracle:thin:#//server.example.com:1521/service_name"
user="user"
password="pass"/>
<document name="products">
<entity name="product"
dataSource="myDS"
query="select * from products"
pk="id"
processor="SqlEntityProcessor">
<field column="id" name="id" />
<field column="name" name="name" />
<field column="price" name="price" />
<field column="store" name="store" />
<!-- I've added this blob field -->
<field column="picture" name="picture" />
</entity>
</document>
</dataConfig>
solrconfig.xml
<requestHandler name="/products" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
<!-- JDBCs -->
<lib dir="../../../lib" />
My fields in schema.xml
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="_text" type="string" indexed="true" stored="false" multiValued="true"/>
<field name="name" type="string" indexed="true" stored="true"/>
<field name="price" type="float" indexed="true" stored="true"/>
<!-- BLOB field -->
<field name="picture" type="binary" indexed="true" stored="true"/>
<copyField source="*" dest="_text"/>
<!-- ommited solr default fields -->
Now, when I start a full-importer, SOLR only indexes some records. This is the output after SOLR finish the importing:
Indexing completed. Added/Updated: 64 documents. Deleted 0 documents. (Duration: 04s)
Requests: 1 (0/s), Fetched: 1369 (342/s), Skipped: 0, Processed: 64 (16/s)
Started: less than a minute ago
As you can see, I have 1369 records, but SOLR only index 64 documents. If I remove the field picture from schema or, set index and stored attributes to false, SOLR import all documents.
I opened the SOLR log, and found this error when importing the blob field:
3436212 [Thread-19] WARN org.apache.solr.handler.dataimport.SolrWriter – Error creating document : SolrInputDocument(fields: [name=PRODUCTNAME, price=PRICE, store=STORE, picture=oracle.sql.BLOB#4130607a, _version_=1497915495941144576])
org.apache.solr.common.SolrException: ERROR: [doc=<ID>] Error adding field 'picture'='oracle.sql.BLOB#4130607a' msg=Illegal character .
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:176)
at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:78)
at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1085)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:697)
at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:71)
at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:263)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:511)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.lang.IllegalArgumentException: Illegal character .
at org.apache.solr.common.util.Base64.base64toInt(Base64.java:150)
at org.apache.solr.common.util.Base64.base64ToByteArray(Base64.java:117)
at org.apache.solr.schema.BinaryField.createField(BinaryField.java:89)
at org.apache.solr.schema.FieldType.createFields(FieldType.java:305)
at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:48)
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:123)
... 18 more
I checked querying directly against database, and it's working fine. I am using SOLR 5, ojdbc7 and Java 8. How can I use the binary field correctly in SOLR?
Update
I've changed the properties of picture in schema.xml setting indexed=false. This way:
<!-- BLOB field -->
<field name="picture" type="binary" indexed="false" stored="true"/>
Then, I restarted SOLR, reloaded my core, and did a Full-Import again. No success and same exception. The same 64 documents that I described above was imported and the field picture does not appear in JSON response. The query I execute is:
/select?q=*%3A*&wt=json&indent=true

Solr not recognizing camelcased field names over update/extract?

I've been working with SolrJ for months now without any problem with a schema that follows the following pattern, with underscores and camelcasing:
<field name="museum_eventActor" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventType" type="text" indexed="false" stored="true" multiValued="true"/>
<field name="museum_eventPlace" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventDate" type="text" indexed="true" stored="true" multiValued="true"/>
We recently decided that we wanted to index some PDF content, so I started using curl to test some content:
curl "http://localhost:8090/solr-museum/archival/update/extract?stream.file=/home/user/Downloads/transcript.pdf&stream.contentType=application/pdf&literal.id=C1-1-5&literal.museum_eventActor=test&fmap.content=text&commit=true"
But I was noticing that although Solr was acknowledging my fields, none of them were showing up in my index. Solr log says:
792560 [http-8090-1] INFO org.apache.solr.update.processor.LogUpdateProcessor – [archivalRecord] webapp=/solr-museum path=/update/extract params={fmap.content=text&commit=true&literal.museum_eventActor=&literal.id=C1-1-5&stream.contentType=application/pdf&stream.file=/home/user/Downloads/transcript.pdf} {add=[C1-1-5 (1467000805262360576)],commit=} 0 698
and the index looks like:
<doc>
<str name="id">C1-1-5</str>
<long name="_version_">1467000805262360576</long>
<arr name="content">
<str>1467000805262360576</str>
</arr>
</doc>
After a day of playing around and searching online, I found this SO question which made me wonder about camelcasing: Solr - Missing Required Field
So I modified my schema to look something like this:
<field name="museum_eventactor" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museumeventtype" type="text" indexed="false" stored="true" multiValued="true"/>
<field name="museumeventPlace" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventDate" type="text" indexed="true" stored="true" multiValued="true"/>
And fired over this request:
curl "http://localhost:8090/solr-museum/archival/update/extract?stream.file=/home/user/Downloads/transcript.pdf&stream.contentType=application/pdf&literal.id=C1-1-5&literal.museum_eventactor=test&literal.museumeventtype=test&literal.museumeventPlace=test&literal.museum_eventDate=test&fmap.content=text&commit=true"
And sure enough, the fields with camelcasing aren't being recognized:
<doc>
<arr name="museum_eventactor">
<str>test</str>
</arr>
<str name="id">C1-1-5</str>
<arr name="museumeventtype">
<str>test</str>
</arr>
<long name="_version_">1467001178833289216</long>
</doc>
Now I've searched through a lot of the Solr documentation, and although they point out repeatedly that there are very few restrictions on field names if you're willing to accept the consequences, never have I ever encountered a scenario where camelcasing isn't a valid naming scheme especially in Java. I'm kind of stumped as to why this might be happening. Does anyone have any ideas that might explain this behavior?

DataSource.getInheritsFrom() fails with ClassCastException

In SmartGwtEE project I have hierarchy of DataSources described in .ds.xml files, here is some of them:
BaseElement_DS.ds.xml
<DataSource ID="BaseElement_DS" serverConstructor="com.isomorphic.jpa.JPADataSource"
beanClassName="lnudb.server.model.BaseElement">
<fields>
<field name="id" type="sequence" hidden="true" primaryKey="true" />
<field name="name" type="text" title="Name" required="true" />
<field name="dsId" type="text" title="Datasource" hidden="true"/>
</fields>
</DataSource>
Human_DS.ds.xml
<DataSource ID="Human_DS" serverConstructor="com.isomorphic.jpa.JPADataSource"
beanClassName="org.zasadnyy.lnudb.server.model.Human" inheritsFrom="BaseElement_DS"
useParentFieldOrder="true">
<fields>
<field name="surname" type="text" />
<field name="birthday" type="date" title="Birthday" required="false" />
</fields>
</DataSource>
Problem: when I try to get parent datasource id in code
String parentDsId = DataSource.get("Human_DS").getInheritsFrom();
ClassCastExeption is raised from inside of getInheritsFrom() method:
java.lang.ClassCastException: com.google.gwt.core.client.JavaScriptObject$ cannot be cast to java.lang.String
I will be grateful for any help.
This way you won't get the exception any longer:
String parentDsId = DataSource.get("Human_DS").getInheritsFrom() + "";
However, I'm not sure whether this is "ok" for your purposes. If this is not good for you, try to create a Javascript object and assign the before mentioned value to it. I hope this helps.

Trying to serialize an object compactly using Castor

I'm using Castor to write out a map of user ID's to time intervals. I'm using it to save and resume progress in a lengthy task, and I'm trying to make the XML as compact as possible. My map is from string userID's to a class that contains the interval timestamps, along with additional transient data that I don't need to serialize.
I'm able to use a nested class mapping:
...
<field name="userIntervals" collection="map">
<bind-xml name="u">
<class name="org.exolab.castor.mapping.MapItem">
<field name="key" type="string"><bind-xml name="n" node="attribute"/></field>
<field name="value" type="my.package.TimeInterval"/>
</class>
</bind-xml>
</field>
...
<class name="my.package.TimeInterval">
<map-to xml="ti"/>
<field name="intervalStart" type="long"><bind-xml name="s" node="attribute"/></field>
<field name="intervalEnd" type="long"><bind-xml name="e" node="attribute"/></field>
</class>
...
And get output that looks like:
<u n="36164639"><value s="1292750896000" e="1292750896000"/></u>
What I'd like is the name, start, and end of the user in a single node like this.
<u n="36164639" s="1292750896000" e="1292750896000"/>
But I can't seem to finagle it so the start and end attributes in the "value" go in the same node as the "key". Any ideas would be greatly appreciated.
Nash,
I think to arrange the castor mapping is bit tricky.
If you want to have structure like
<u n="36164639" s="1292750896000" e="1292750896000"/>
Then you need to create a new pojo file where it will be having
all the three fields Key,intervalStart,intervalEnd.
And let the File name as KeyTimeInterval
And map it like the below.
<field name="userIntervals" collection="map">
<class name="org.exolab.castor.mapping.MapItem">
<field name="u" type="my.package.KeyTimeInterval">
<bind-xml name="u" node="element"/>
</field>
</class>
</field>
<class name="my.package.KeyTimeInterval">
<field name="key" type="String">
<bind-xml name="n" node="attribute"/></field>
<field name="intervalStart" type="long">
<bind-xml name="s" node="attribute"/></field>
<field name="intervalEnd" type="long">
<bind-xml name="e" node="attribute"/></field>
</class>
I think you should be able to use location on s and e. Try this:-
...
<class name="my.package.TimeInterval">
<map-to xml="ti"/>
<field name="intervalStart" type="long">
<bind-xml name="s" location="u" node="attribute"/>
</field>
<field name="intervalEnd" type="long">
<bind-xml name="e" location="u" node="attribute"/>
</field>
</class>
Am answering my own question here, since there is a solution that does exactly what I want, and there's actually an error in the explanation at http://www.castor.org/xml-mapping.html#Sample-3:-Using-the-container-attribute - the container attribute is exactly what's needed here.
Changing one line in the mapping:
<field name="value" type="my.package.TimeInterval" container="true"/>
did exactly what I wanted, it didn't create a subelement for the value, just mapped the fields into the existing parent element. Since then, I've used this quite a few times to map multiple-value classes into their parent.
The error of course is the documentation states you do this by setting the container attribute to false. Of course, it should be true.

Categories