I've been working with SolrJ for months now without any problem with a schema that follows the following pattern, with underscores and camelcasing:
<field name="museum_eventActor" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventType" type="text" indexed="false" stored="true" multiValued="true"/>
<field name="museum_eventPlace" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventDate" type="text" indexed="true" stored="true" multiValued="true"/>
We recently decided that we wanted to index some PDF content, so I started using curl to test some content:
curl "http://localhost:8090/solr-museum/archival/update/extract?stream.file=/home/user/Downloads/transcript.pdf&stream.contentType=application/pdf&literal.id=C1-1-5&literal.museum_eventActor=test&fmap.content=text&commit=true"
But I was noticing that although Solr was acknowledging my fields, none of them were showing up in my index. Solr log says:
792560 [http-8090-1] INFO org.apache.solr.update.processor.LogUpdateProcessor – [archivalRecord] webapp=/solr-museum path=/update/extract params={fmap.content=text&commit=true&literal.museum_eventActor=&literal.id=C1-1-5&stream.contentType=application/pdf&stream.file=/home/user/Downloads/transcript.pdf} {add=[C1-1-5 (1467000805262360576)],commit=} 0 698
and the index looks like:
<doc>
<str name="id">C1-1-5</str>
<long name="_version_">1467000805262360576</long>
<arr name="content">
<str>1467000805262360576</str>
</arr>
</doc>
After a day of playing around and searching online, I found this SO question which made me wonder about camelcasing: Solr - Missing Required Field
So I modified my schema to look something like this:
<field name="museum_eventactor" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museumeventtype" type="text" indexed="false" stored="true" multiValued="true"/>
<field name="museumeventPlace" type="text" indexed="true" stored="true" multiValued="true"/>
<field name="museum_eventDate" type="text" indexed="true" stored="true" multiValued="true"/>
And fired over this request:
curl "http://localhost:8090/solr-museum/archival/update/extract?stream.file=/home/user/Downloads/transcript.pdf&stream.contentType=application/pdf&literal.id=C1-1-5&literal.museum_eventactor=test&literal.museumeventtype=test&literal.museumeventPlace=test&literal.museum_eventDate=test&fmap.content=text&commit=true"
And sure enough, the fields with camelcasing aren't being recognized:
<doc>
<arr name="museum_eventactor">
<str>test</str>
</arr>
<str name="id">C1-1-5</str>
<arr name="museumeventtype">
<str>test</str>
</arr>
<long name="_version_">1467001178833289216</long>
</doc>
Now I've searched through a lot of the Solr documentation, and although they point out repeatedly that there are very few restrictions on field names if you're willing to accept the consequences, never have I ever encountered a scenario where camelcasing isn't a valid naming scheme especially in Java. I'm kind of stumped as to why this might be happening. Does anyone have any ideas that might explain this behavior?
Related
I want to filter data nearby location using Solr Spatial Search but can't find any data
I want to filter data nearby location using Solr Spatial Search but can't find any data.
I have done the following changes in schema.xml:
<fieldType name="location" class="solr.LatLonPointSpatialField" docValues="true" />
<field name="latlong" type="text_general" indexed="true" stored="false" multiValued="false"/>
<field name="location" type="location" indexed="true" stored="true" multiValued="false"/>
Query :
q=:&fq={!geofilt%20sfield=location}&pt=22.303894,70.802162&d=5
Response :
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":0,
"params":{
"q":"*:*",
"pt":"22.303894,70.802162",
"d":"5",
"fq":"{!geofilt sfield=location}"}},
"response":{"numFound":0,"start":0,"numFoundExact":true,"docs":[]
}}
Expected Document:
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":0,
"params":{
"q":"*:*",
"indent":"true",
"q.op":"OR",
"_":"1671173790419"}},
"response":{"numFound":1,"start":0,"numFoundExact":true,"docs":[
{
"brandnd_name":["Luxury"],
"country":["India"],
"latitude":[22.2913494],
"longitude":[70.8023621],
"updated_date":["54:03.6"],
"added_date":["0018-05-14T11:19:00Z"],
"active_ind":[true],
"synxis_hotel_id":[37273],
"id":"4598a178-d984-41f1-b009-bbda37469b7c",
"_version_":1752351636887437312},
I have followed the tutorials given in below link
Indexing csv file in solr
I have configured solr server in my local and
But when i try to post csv file using the below command
java -Dtype=text/csv -Durl=http://localhost:8983/solr/jcg/update -jar post.jar books.csv
I am getting below error response in command prompt
Any one help why i am getting the error response
Error:
<response>
<lst name="responseHeader"><int name="status">400</int><int name="QTime">111</int></lst><lst name="error"><lst name="metadata"><str name="error-class">org.apache.solr.common.SolrException</str><str name="root-error-class">org.apache.solr.common.SolrException</str></lst><str name=
"msg">ERROR: [doc=0553573403] unknown field 'cat'</str><int name="code">400</int></lst>
</response>
The tutorial you're following is quite old, so if you want follow the tutorial there are two options:
the tutorial you're following misses the creation of Schema configuration in jcg collection. In this case you should fix your managed-schema file to jcg configuration taking care to add the following fields as suggested in the Tutorial, then reload the configuration (or restart Solr). At this point the "Indexing the Data" step should work correctly.
<uniqueKey>id</uniqueKey>
<!-- Fields added for books.csv load-->
<field name="cat" type="text_general" indexed="true" stored="true"/>
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="price" type="tdouble" indexed="true" stored="true"/>
<field name="inStock" type="boolean" indexed="true" stored="true"/>
<field name="author" type="text_general" indexed="true" stored="true"/>
You have correctly added schema and fields but not reloaded the configuration, but not reloaded the collection. So just reload the configuration (or restart Solr) and continue with the tutorial.
On the other hand, if you're using an earlier version of Solr (6.4) I suggest to delete jcg collection and create it again:
bin/solr delete -c jcg
bin/solr create -c jcg -d ./server/solr/configsets/sample_techproducts_configs
I think the title of my question explains much of what I need. I am using the Data Importer Handler of Apache SOLR 5. I configured my solrconfig.xml, schema.xml and data-config.xml. It's working for now.
However, I need to add one more field. An Oracle Blob field. First, let me show my configurations:
data-config.xml
<dataConfig>
<!-- Datasource -->
<dataSource name="myDS"
setReadOnly="true"
driver="oracle.jdbc.OracleDriver"
url="jdbc:oracle:thin:#//server.example.com:1521/service_name"
user="user"
password="pass"/>
<document name="products">
<entity name="product"
dataSource="myDS"
query="select * from products"
pk="id"
processor="SqlEntityProcessor">
<field column="id" name="id" />
<field column="name" name="name" />
<field column="price" name="price" />
<field column="store" name="store" />
<!-- I've added this blob field -->
<field column="picture" name="picture" />
</entity>
</document>
</dataConfig>
solrconfig.xml
<requestHandler name="/products" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
<!-- JDBCs -->
<lib dir="../../../lib" />
My fields in schema.xml
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="_text" type="string" indexed="true" stored="false" multiValued="true"/>
<field name="name" type="string" indexed="true" stored="true"/>
<field name="price" type="float" indexed="true" stored="true"/>
<!-- BLOB field -->
<field name="picture" type="binary" indexed="true" stored="true"/>
<copyField source="*" dest="_text"/>
<!-- ommited solr default fields -->
Now, when I start a full-importer, SOLR only indexes some records. This is the output after SOLR finish the importing:
Indexing completed. Added/Updated: 64 documents. Deleted 0 documents. (Duration: 04s)
Requests: 1 (0/s), Fetched: 1369 (342/s), Skipped: 0, Processed: 64 (16/s)
Started: less than a minute ago
As you can see, I have 1369 records, but SOLR only index 64 documents. If I remove the field picture from schema or, set index and stored attributes to false, SOLR import all documents.
I opened the SOLR log, and found this error when importing the blob field:
3436212 [Thread-19] WARN org.apache.solr.handler.dataimport.SolrWriter – Error creating document : SolrInputDocument(fields: [name=PRODUCTNAME, price=PRICE, store=STORE, picture=oracle.sql.BLOB#4130607a, _version_=1497915495941144576])
org.apache.solr.common.SolrException: ERROR: [doc=<ID>] Error adding field 'picture'='oracle.sql.BLOB#4130607a' msg=Illegal character .
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:176)
at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:78)
at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1085)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:697)
at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:71)
at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:263)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:511)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.lang.IllegalArgumentException: Illegal character .
at org.apache.solr.common.util.Base64.base64toInt(Base64.java:150)
at org.apache.solr.common.util.Base64.base64ToByteArray(Base64.java:117)
at org.apache.solr.schema.BinaryField.createField(BinaryField.java:89)
at org.apache.solr.schema.FieldType.createFields(FieldType.java:305)
at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:48)
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:123)
... 18 more
I checked querying directly against database, and it's working fine. I am using SOLR 5, ojdbc7 and Java 8. How can I use the binary field correctly in SOLR?
Update
I've changed the properties of picture in schema.xml setting indexed=false. This way:
<!-- BLOB field -->
<field name="picture" type="binary" indexed="false" stored="true"/>
Then, I restarted SOLR, reloaded my core, and did a Full-Import again. No success and same exception. The same 64 documents that I described above was imported and the field picture does not appear in JSON response. The query I execute is:
/select?q=*%3A*&wt=json&indent=true
I messed with syntax q query:
if I write q=*:* - I see 2 results.
If I skip q - I have not see anything
if I write q=price:* - see 2 results
if I write q=price - 0 results
update
q=price:0 - 1 result
Can you explain differences between these queries?
especially I want to understand what does it mean 4 th variant ?
indexed documents:
add><doc>
<field name="id">3007WFP</field>
<field name="name">Dell Widescreen UltraSharp 3007WFP</field>
<field name="manu">Dell, Inc.</field>
<!-- Join -->
<field name="manu_id_s">dell</field>
<field name="cat">electronics</field>
<field name="cat">monitor</field>
<field name="features">30" TFT active matrix LCD, 2560 x 1600, .25mm dot pitch, 700:1 contrast</field>
<field name="includes">USB cable</field>
<field name="weight">401.6</field>
<field name="price">2199</field>
<field name="popularity">6</field>
<field name="inStock">true</field>
<!-- Buffalo store -->
<field name="store">43.17614,-90.57341</field>
<field name="cat">XXX</field>
</doc></add>
<add>
<doc>
<field name="id">SOLR1000</field>
<field name="name">Solr, the Enterprise Search Server</field>
<field name="manu">Apache Software Foundation</field>
<field name="cat">software</field>
<field name="cat">search</field>
<field name="cat">XXX</field>
<field name="features">Advanced Full-Text Search Capabilities using Lucene</field>
<field name="features">Optimized for High Volume Web Traffic</field>
<field name="features">Standards Based Open Interfaces - XML and HTTP</field>
<field name="features">Comprehensive HTML Administration Interfaces</field>
<field name="features">Scalability - Efficient Replication to other Solr Search Servers</field>
<field name="features">Flexible and Adaptable with XML configuration and Schema</field>
<field name="features">Good unicode support: héllo (hello with an accent over the e)</field>
<field name="price">0</field>
<field name="popularity">10</field>
<field name="inStock">true</field>
<field name="incubationdate_dt">2006-01-17T00:00:00.000Z</field>
</doc>
</add>
If you do not give the value it consider the default value. As in your fourth query
q=price means it searches the default searchable field having value "price"
That's why you are getting 0 result since no price is of 0 value.
I have a Solr schema that has a "url" field:
<fieldType name="url" class="solr.TextField"
positionIncrementGap="100">
</fieldType>
<fields>
<field name="id" type="string" stored="true" indexed="true"/>
<field name="url" type="url" stored="true" indexed="false"/>
<field name="chunkNum" type="long" stored="true" indexed="false"/>
<field name="origScore" type="float" stored="true" indexed="true"/>
<field name="concept" type="string" stored="true" indexed="true"/>
<field name="text" type="text" stored="true" indexed="true"
required="true"/>
<field name="title" type="text" stored="true" indexed="true"/>
<field name="origDoctype" type="string" stored="true" indexed="true"/>
<field name="keywords" type="string" stored="true" indexed="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>text</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>
I can add SolrInputDocuments with all the fields and query them back using the text field and/or with a filter query on "concept". But when I try to query a specific url, I don't get any results. My code looks like:
SolrQuery query = new SolrQuery();
query.setQuery("url:" + ClientUtils.escapeQueryChars(url));
//query.setQuery("*:*");
//query.addFilterQuery("url:" + ClientUtils.escapeQueryChars(url));
List<Chunk> retCode = null;
try
{
QueryResponse resp = solrServer.query(query);
SolrDocumentList docs = resp.getResults();
retCode = new ArrayList<Chunk>(docs.size());
for (SolrDocument doc : docs)
{
LOG.debug("got doc " + doc);
Chunk chunk = new Chunk(doc);
retCode.add(chunk);
}
}
catch (SolrServerException e)
{
LOG.error("caught a server exception", e);
}
return retCode;
I've tried with and without the ClientUtils.escapeQueryChars and I've tried using a query of "url:" or a filter query on url. I never get anything back. Any hints?
Whats the actual type of "url"? In your schema.xml you should have a set of "fieldType" elements which list the actual Solr backing classes and filters that make up a data type.
For your "fieldType" for the "url" you are interested in the "class" attribute. E.g. the most basic free-text type has a class="solr.TextField". You might be using a type that has some wacky filters on it and Lucene/Solr ends up indexing your data differently from what you would expect.
Download Luke and look at your index visually:
http://www.getopt.org/luke/
It will help you "look" at your data - like I said, maybe its stored differently than what you expect.
Dammit, another stupid one on my part: Thanks to Cody's suggestion of using Luke, I discovered this inconvenient part of the schema:
<field name="url" type="url" stored="true" indexed="false"/>
Changing that to indexed="true" fixed the problem.