How to create nested document in Solr indexing? - java

I want to create nested document in solr, I am using java/GWT/SolrJ.
Currently I am indexing following fields:
Items:
id title desc.
1 xyz xyzxyzxyz
2 pqr pqrpqrpqr
3 abc abcabcabc.
But now i want to create one more document linked with each document from above i.e. for id 1 there is one subdocument which contains follwing fields:
Item_User_Details:
for item 1 :
user details
1 qweqweqwe
2 xyzxyzxyz
3 asdasdasd
in this way I want to create for each item id from above table, there is one linked document of item_user_details.
How can I do this...?
Thanks in advance...

In our schema we've a lot of related tables.
We decided to flatten all relations into one document. To achieve this we created a custom importer (using SolrJ), which loads each document from the index, adds the related fields and write that document back.
[edit]
We do this in the following way:
export the data in a csv-file for each table (item, item_user_details)
import each csv-file into Solr, starting with the top (item in your case)
Start an Embedded-Solr server:
System.setProperty("solr.solr.home", config.getSolrIndexPath());
CoreContainer.Initializer initializer = new CoreContainer.Initializer();
this.coreContainer = initializer.initialize();
this.solr = new EmbeddedSolrServer(this.coreContainer, "");
Alternatively you can access a remote solr instance:
this.solr = new HttpSolrServer("http://[your-url]/solr");
Create a SolrDocument for each line in the file
add it to the index this.solr.add(ClientUtils.toSolrInputDocument(doc));
Commit this.solr.commit();
Load documents from the index (items)
Idetify relations in the csv-file for item_user_details via the document id (item-id)
Exted the loaded document with the fields from item_user_details
Commit the Document again

Related

How to read attributes out of multiple nested documents in MongoDB Java?

I need some help with a project I am planing to do. At this stage I am trying to learn using NoSQL Databases in Java.
I've got a few nested documents looking like this:
MongoDB nesting structure
Like you can see on the image, my inner attributes are "model" and "construction".
Now I need to iterate through all the documents in my collection, whose keynames are unknown, because they are generated in runtime, when a user enters some information.
At the end I need to list them in a TreeView, keeping the structure they have already in the database.
What I've tried is getting keySets from documents, but I cannot pass the second layer of the structure. I am able to print the whole Object in Json format, but I cannot access the specific attributes like "model" or "construction".
MongoCollection collection= mongoDatabase.getCollection("test");
MongoCursor<Document> cursor = collection.find().iterator();
for(String keys: document.keySet()) {
Document vehicles = (Document) document.getString(keys);
//System.out.println(keys);
//System.out.println(document.get(keys));
}
/Document cars = (Document) vehicle.get("cars");
Document types = (Document) cars.get("coupes");
Document brands = (Document) types.get("Ford");
Document model = (Document) brands.get("Mustang GT");
Here I tried to get some properties, by hardcoding the keynames of the documents, but I can't seem to get any value either. It keeps telling me that it could not read from vehicle, because it is null.
The most tutorials and posts in forums, somehow does not work for me. I don't know if they have any other version of MongoDB Driver. Mine is: mongodb driver 3.12.7. if this helps you in any way.
I am trying to get this working for days now and it is driving me crazy.
I hope there is anyone out there who is able to help me with this problem.
Here is a way you can try using the Document class's methods. You use the Document#getEmbedded method to navigate the embedded (or sub-document) document's path.
try (MongoCursor<Document> cursor = collection.find().iterator()) {
while (cursor.hasNext()) {
// Get a document
Document doc = (Document) cursor.next();
// Get the sub-document with the known key path "vehicles.cars.coupes"
Document coupes = doc.getEmbedded(
Arrays.asList("vehicles", "cars", "coupes"),
Document.class);
// For each of the sub-documents within the "coupes" get the
// dynamic keys and their values.
for (Map.Entry<String, Object> coupe : coupes.entrySet()) {
System.out.println(coupe.getKey()); // e.g., Mercedes
// The dynamic sub-document for the dynamic key (e.g., Mercedes):
// {"S-Class": {"model": "S-Class", "construction": "2011"}}
Document coupeSubDoc = (Document) coupe.getValue();
// Get the coupeSubDoc's keys and values
coupeSubDoc.keySet().forEach(k -> {
System.out.println("\t" + k); // e.g., S-Class
System.out.println("\t\t" + "model" + " : " +
coupeSubDoc.getEmbedded(Arrays.asList(k, "model"), String.class));
System.out.println("\t\t" + "construction" + " : " +
coupeSubDoc.getEmbedded(Arrays.asList(k, "construction"), String.class));
});
}
}
}
The above code prints to the console as:
Mercedes
S-Class
model : S-Class
construction : 2011
Ford
Mustang
model : Mustang GT
construction : 2015
I think it's not the complete answer to his question.
Here he says:
Now I need to iterate through all the documents in my collection, whose keynames are unknown, because they are generated in runtime, when a user enters some information.
Your answer #prasad_ just refers to his case with vehicles, cars and so on. He needs a way to handle unknown key/value pairs i guess. For example, in this case he only knows the keys:vehicle,cars,coupe,Mercedes/Ford and their subkeys. If another user inserts some new key/value paairs in the collection he will have problems because he can't navigate trough the new document without to have a look into the database.
I'm also interested in the solution because I never nested my key/value pairs and cant see the advantage of it. Am I wrong or does it make the programming more difficult?

Java how to get rid of redundant code

In Java I parse a XML document. This XML is a Purchase Order and from this XML I create a PO document in our ERP-system.
I use domparser to parse the XML.
So eventually I have code like this:
--this is an excerpt --
//ShipTo
Element shipToElement = CXMLHandlerObj.getChildElement(elementOrderRequestHeader, "ShipTo");
//Address
Element shipToAddressElement = CXMLHandlerObj.getChildElement(shipToElement, "Address");
/*get attributes of Address*/
notesHandlerObj.docOrder.replaceItemValue("ShipToParty_addressID", shipToAddressElement.getAttribute("addressID"));
notesHandlerObj.docOrder.replaceItemValue("ShipToParty_addressIDDomain", shipToAddressElement.getAttribute("addressIDDomain"));
notesHandlerObj.docOrder.replaceItemValue("ShipToParty_isoCountryCode", shipToAddressElement.getAttribute("isoCountryCode"));
But the XML also contains at the top a OrderRequestHeader which has a type attribute in it:
<OrderRequestHeader orderDate="2017-04-04T12:00:00+00:00" orderID="4550144777" orderType="regular" orderVersion="1" type="new">
Below this element all the details of the order are found.
The "type" attribute can have values like : New or Update.
The type will be "new" if the PO XML is send for the first time and the type will be "update" if the same PO is sent but then with an update contained within it.
Note that the XML structure is the same but only the type is different.
When the type is "New", I will just parse the XML and create the PO document. But if the type is "Update" then I want to check every element and update the document and mail the changes accordingly..
Now the problem is that for the parsing of the XML I need to create a new PO or update an existing one. This I can do by the following ways:
1. creating two methods :
1. create new PO
2. update PO
In the create method I can parse the xml and add values from element to the document.
In the update method I can parse again all elements but also check which data has been changed.
2. I can put a if and else statement before every element
The methods of above are a bit redudant is there any simpler way of doing this?

query search on an index with two different documents type

i've an index in which there are heterogeneous documents. These documents have only 1 common field (a personal id) for example:
DOC
id: 7
content: this example content doc has a long text
type: content
DOC
id: 7
title: example doc
public: yes
type: metadata
i've chosen this solution because i want to manage the long text documents separately from the metadata documents.
If i perform a query like this
+(content: example title: example) +public: yes
lucene return correctly the document type "metadata" with id 7 but if i perform this other one:
+(content: long) +public: yes
lucene doesn't return me the document because the clause +public: yes (necessary for my application) corresponds to a field not in "content" type document.
My question: how can i ask lucene to give back the "content" document that has the "public" field "yes" contained into the other document with the same id (with only a single query)?
Sorry for my english, thanks to all.
Would it work if you didn't make the 'public' field required. So:
+(content: long) public: yes
Or can you strip the 'public' field out of the query string before submitting it to Lucene?

Adding entities to solr using solrj and schema.xml

I would like to add entities to documents like you can do with the data-config.
At the moment I'm indexing every page of my documents as a single document.
Now :
<solrDoc>
<id>1</id>
<docname>test.pdf</docmname>
<pagenumber>1</pagenumber>
<pagecontent>blablabla</pagecontent>
</solrDoc>
<solrDoc>
<id>2</id>
<docname>test.pdf</docmname>
<pagenumber>2</pagenumber>
<pagecontent>blablabla</pagecontent>
</solrDoc>
As you can see the data related to the document is stored x pages times. I would like to get documents like this:
<doc>
<id>1</id>
<docname>test.pdf</docmname>
<pageEntries> //multivaluefield
<pageEntry><pagenumber>1</pagenumber><pagecontent>blablabla</pagecontent></pageEntry>
<pageEntry><pagenumber>2</pagenumber><pagecontent>blablabla</pagecontent></pageEntry>
</pageEntries>
</doc>
I don't know how to make something like pageEntry. I saw that solr can import entities from databases but I'm wondering how I can do the same? (or something similar)
I'm using solr 3.6.1. The page extraction is done by myself using pdfbox.
Java code:
SolrInputDocument solrDoc = new SolrInputDocument();
solrDoc.setField("id", 1);
solrDoc.setField("filename", "test");
for (int p : pages) {
solrDoc.addField("page", p);
}
for (String pc : pagecont) {
solrDoc.addField("pagecont", pc);
}
If the extraction is performed by you, you can club all the pages and feed it as a single Solr document with the pagenumber & pagecontent being multivalued fields.
You can use the same id for all the pages (with the id not being a primary field in the schema definition) and use Grouping (Field Collapsing) to group the results for the documents.

Using BooleanQuery or write more indexes?

A category tree like this:
root_1
sub_1
sub_2
... to sub_20
Every document has a sub category(like sub_2). Now, I only wrote sub_2 in lucene index:
new NumericField("category",...).setIntValue(sub_2.getID());
I want to get all root_1's documents, using BooleanQuery (merge the sub_1 to sub_20) to search or write an other category in every entry document:
new NumericField("category",...).setIntValue(sub_2.getID());
new NumericField("category",...).setIntValue(root_1.getID());//sub_2's ancestor category
Which is the better choice?
I would use a path enumeration/'Dewey Decimal' representation of the category hierarchy. That is, instead of just storing 'sub_2' for the second child of the first root, store instead something like '001.002'.
To find the root and all of its children, you would search on "category:001*".
To find only the children of the root, you would search on "category:001.*".
(Please also see How to store tree data in a Lucene/Solr/Elasticsearch index or a NoSQL db?.)

Categories