Using BooleanQuery or write more indexes? - java

A category tree like this:
root_1
sub_1
sub_2
... to sub_20
Every document has a sub category(like sub_2). Now, I only wrote sub_2 in lucene index:
new NumericField("category",...).setIntValue(sub_2.getID());
I want to get all root_1's documents, using BooleanQuery (merge the sub_1 to sub_20) to search or write an other category in every entry document:
new NumericField("category",...).setIntValue(sub_2.getID());
new NumericField("category",...).setIntValue(root_1.getID());//sub_2's ancestor category
Which is the better choice?

I would use a path enumeration/'Dewey Decimal' representation of the category hierarchy. That is, instead of just storing 'sub_2' for the second child of the first root, store instead something like '001.002'.
To find the root and all of its children, you would search on "category:001*".
To find only the children of the root, you would search on "category:001.*".
(Please also see How to store tree data in a Lucene/Solr/Elasticsearch index or a NoSQL db?.)

Related

Java how to get rid of redundant code

In Java I parse a XML document. This XML is a Purchase Order and from this XML I create a PO document in our ERP-system.
I use domparser to parse the XML.
So eventually I have code like this:
--this is an excerpt --
//ShipTo
Element shipToElement = CXMLHandlerObj.getChildElement(elementOrderRequestHeader, "ShipTo");
//Address
Element shipToAddressElement = CXMLHandlerObj.getChildElement(shipToElement, "Address");
/*get attributes of Address*/
notesHandlerObj.docOrder.replaceItemValue("ShipToParty_addressID", shipToAddressElement.getAttribute("addressID"));
notesHandlerObj.docOrder.replaceItemValue("ShipToParty_addressIDDomain", shipToAddressElement.getAttribute("addressIDDomain"));
notesHandlerObj.docOrder.replaceItemValue("ShipToParty_isoCountryCode", shipToAddressElement.getAttribute("isoCountryCode"));
But the XML also contains at the top a OrderRequestHeader which has a type attribute in it:
<OrderRequestHeader orderDate="2017-04-04T12:00:00+00:00" orderID="4550144777" orderType="regular" orderVersion="1" type="new">
Below this element all the details of the order are found.
The "type" attribute can have values like : New or Update.
The type will be "new" if the PO XML is send for the first time and the type will be "update" if the same PO is sent but then with an update contained within it.
Note that the XML structure is the same but only the type is different.
When the type is "New", I will just parse the XML and create the PO document. But if the type is "Update" then I want to check every element and update the document and mail the changes accordingly..
Now the problem is that for the parsing of the XML I need to create a new PO or update an existing one. This I can do by the following ways:
1. creating two methods :
1. create new PO
2. update PO
In the create method I can parse the xml and add values from element to the document.
In the update method I can parse again all elements but also check which data has been changed.
2. I can put a if and else statement before every element
The methods of above are a bit redudant is there any simpler way of doing this?

How to retrieve the Field that "hit" in Lucene

Maybe I'm really missing something.
I have indexed a bunch of key/value pairs in Lucene (v4.1 if it matters). Say I have
key1=value1 and key2=value2, e.g. as read from a properties file.
They get indexed both as specific fields and into a catchall "ALL" field, e.g.
new Field("key1", "value1", aFieldTypeMimickingKeywords);
new Field("key2", "value2", aFieldTypeMimickingKeywords);
new Field("ALL", "key1=value1", aFieldTypeMimickingKeywords);
new Field("ALL", "key2=value2", aFieldTypeMimickingKeywords);
// then get added to the Document of course...
I can then do a wildcard search, using
new WildcardQuery(new Term("ALL", "*alue1"));
and it will find the hit.
But, it would be nice to get more info, like "what was complete value (e.g. "key1=value1") that goes with that hit?".
The best I can figure out it to get the Document, then get the list of IndexableFields, then loop over all of them and see if the field.stringValue().contains("alue1"). (I can look at the data structures in the debugger and all the info is there)
This seems completely insane cause isn't that what Lucene just did? Shouldn't the Hit information return some of the Fields?
Is Lucene missing what seems like "obvious" functionality? Google and starting at the APIs hasn't revealed anything straightforward, but I feel like I must be searching on the wrong stuff.
You might want to try with IndexSearcher.explain() method. Once you get the ID of the matching document, prepare a query for each field (using the same search keywords) and invoke Explanation.isMatch() for each query: the ones that yield true will give you the matched field. Example:
for (String field: fields){
Query query = new WildcardQuery(new Term(field, "*alue1"));
Explanation ex = searcher.explain(query, docID);
if (ex.isMatch()){
//Your query matched field
}
}

How to create nested document in Solr indexing?

I want to create nested document in solr, I am using java/GWT/SolrJ.
Currently I am indexing following fields:
Items:
id title desc.
1 xyz xyzxyzxyz
2 pqr pqrpqrpqr
3 abc abcabcabc.
But now i want to create one more document linked with each document from above i.e. for id 1 there is one subdocument which contains follwing fields:
Item_User_Details:
for item 1 :
user details
1 qweqweqwe
2 xyzxyzxyz
3 asdasdasd
in this way I want to create for each item id from above table, there is one linked document of item_user_details.
How can I do this...?
Thanks in advance...
In our schema we've a lot of related tables.
We decided to flatten all relations into one document. To achieve this we created a custom importer (using SolrJ), which loads each document from the index, adds the related fields and write that document back.
[edit]
We do this in the following way:
export the data in a csv-file for each table (item, item_user_details)
import each csv-file into Solr, starting with the top (item in your case)
Start an Embedded-Solr server:
System.setProperty("solr.solr.home", config.getSolrIndexPath());
CoreContainer.Initializer initializer = new CoreContainer.Initializer();
this.coreContainer = initializer.initialize();
this.solr = new EmbeddedSolrServer(this.coreContainer, "");
Alternatively you can access a remote solr instance:
this.solr = new HttpSolrServer("http://[your-url]/solr");
Create a SolrDocument for each line in the file
add it to the index this.solr.add(ClientUtils.toSolrInputDocument(doc));
Commit this.solr.commit();
Load documents from the index (items)
Idetify relations in the csv-file for item_user_details via the document id (item-id)
Exted the loaded document with the fields from item_user_details
Commit the Document again

dom4J: How to get the value of Elements of a Node?

I am reading an XML using dom4j by using XPath techniques for selecting desired nodes. Consider that my XML looks like this:
<Employees>
<Emp id=1>
<name>jame</name>
<age>12</age>
</Emp>
.
.
.
</Employees>
Now i need to store the Information of all employees in a list of my Employee Class. Until i code the following:
List<? extends Node> lstprmntEmps = document.selectNodes("//Employees/Emp");
ArrayList<Employee> Employees = new ArrayList<Employee>();//Employee is my custom class
for (Node node : lstprmntEmps)
{
Employees.add(ParseEmployee(node));//ParseEmployee(. . .) is my custom function that pareses emp XML and return Employee object
}
Now how do i get the name and age of Currently selected Node?
is there any such method exist node.getElementValue("name");
Cast each node to Element, then ask the element for its first "name" sub-element and its first "age" sub-element and get their text.
See http://dom4j.sourceforge.net/apidocs/org/dom4j/Element.html.
The elementText(String) method of Element maybe gets a sub-element by name and retrieves its text in one operation, but it's undocumented, so it's hard to say.
Note that variables and methods should always start with a lowercase letter in Java.

retrieving the values from the nested hashmap

I have a XML file with many copies of table node structure as below:
<databasetable TblID=”123” TblName=”Department1_mailbox”>
<SelectColumns>
<Slno>dept1_slno</Slno>
<To>dept1_to</To>
<From>dept1_from</From>
<Subject>dept1_sub</Subject>
<Body>dept1_body</Body>
<BCC>dept1_BCC</BCC>
<CC>dept1_CC</CC>
</SelectColumns>
<WhereCondition>MailSentStatus=’New’</WhereCondition>
<UpdateSuccess>
<MailSentStatus>’Yes’</MailSentStatus>
<MailSentFailedReason>’Mail Sent Successfully’</MailSentFailedReason>
</UpdateSuccess>
<UpdateFailure>
<MailSentStatus>’No’</MailSentStatus>
<MailSentFailedReason>’Mail Sending Failed ’</MailSentFailedReason>
</ UpdateFailure>
</databasetable>
As it is not an efficient manner to traverse the file for each time to fetch the details of each node for the queries in the program, I used the nested hashmap concept to store the details while traversing the XML file for the first time. The structure I used is as below:
MapMaster
Key Value
123 MapDetails
Key Value
TblName Department1_mailbox
SelectColumns mapSelect
Key Value
Slno dept1_slno
To dept1_to
From dept1_from
Subject dept1_sub
Body dept1_body
BCC dept1_BCC
CC dept1_CC
WhereCondition MailSentStatus=’New’
UpdateSuccess mapUS
MailSentStatus ’Yes’
MailSentFailedReason ’Mail Sent Successfully’
UpdateFailure mapUF
MailSentStatus ’No’
MailSentFailedReason ’Mail Sending Failed’
But the problem I’m facing now is regarding retrieving the Value part using the nested Keys. For example,
If I need the value of Slno Key, I have to specify TblID, SelectColumns, Slno in nested form like:
Stirng Slno = ((HashMap)((HashMap)mapMaster.get(“123”))mapDetails.get(“SelectColumns”))mapSelect.get(“Slno”);
This is unconvinent to use in a program. Please suggest a solution but don’t tell that iterators are available. As I’ve to fetch the individual value from the map according to the need of my program.
EDIT:my program has to fetch the IDs of the department for which there is privilege to send mails and then these IDs are compared with the IDs in XML file. Only information of those IDs are fetched from XML which returned true in comparison. This is all my program. Please help.
Thanks in advance,
Vishu
Never cast to specific Map implementation. Better use casting to Map interface, i.e.
((Map)one.get("foo")).get("bar")
Do not use casting in your case. You can define collection using generics, so compiler will do work for you:
Map<String, Map> one = new HashMap<String, Map>();
Map<String, Integer> two = new HashMap<String, Integer>();
Now your can say:
int n = one.get("foo").get("bar");
No casting, no problems.
But the better solution is not to use nested tables at all. Create your custom classes like SelectColumns, WhereCondition etc. Each class should have appropriate private fields, getters and setters. Now parse your XML creating instance of these classes. And then use getters to traverse the data structure.
BTW if you wish to use JAXB you do not have to do almost anything! Something like the following:
Unmarshaller u = JAXBContext.newInstance(SelectColumns.class, WhereCondition.class).createUnmarshaller();
SelectColumns[] columns = (SelectColumns[])u.unmarshal(in);
One approach to take would be to generate fully qualified keys that contain the XML path to the element or attribute. These keys would be unique, stored in a single hashmap and get you to the element quickly.
Your code would simply have to generate a unique textual representation of the path and store and retrieve the xml element based on the key.

Categories