How to Use SynonymMap In CustomAnalayzer of Lucene 6.2.0 - java

I do not want to write my own Analyzer class. i have seen a new feature given by apache lucene CustomAnalyzer where you can Build your own Custom Analyzer .
Analyzer ana = CustomAnalyzer.builder(Paths.get(index))
.withTokenizer(StandardTokenizerFactory.class).addTokenFilter(LowerCaseFilterFactory.class)
.addTokenFilter(StandardFilterFactory.class).build();
so here i want to add one more option for addTokenFilter(SynonymFilter.class) where i will pass the default value to the analyzer like synonymMap,tokenstream and everything so i just want to Ask ..
addTokenFilter(StopFilterFactory.class, "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset")
i have seen this example is possible to use in the same way like here StopFilter is defined
Is it Possible to use SynonymMap inside custom analyzer or not ..and if Yes then how to do so ..
constructor for synonym filter is as such..
HashMap<String, String> synonymMap = new HashMap<String, String>(10);
synonymMap.put("synonyms", "Facebook");
i am doing it like this
SynonymMap.Builder builder = new SynonymMap.Builder(true);
builder.add(new CharsRef("Facebook"), new CharsRef("YearBook,FaceB00k"), true);
builder.add(new CharsRef("Facebook1"), new CharsRef("Fraud"), false);
builder.add(new CharsRef("Suzie"), new CharsRef("Susan"), false);
SynonymMap map = null;
try {
map = builder.build();
} catch (IOException e) {
e.printStackTrace();
}
Analyzer ana = CustomAnalyzer.builder(Paths.get(index))
.withTokenizer(StandardTokenizerFactory.class)
.addTokenFilter(StandardFilterFactory.class)
.addTokenFilter(LowerCaseFilterFactory.class)
.addTokenFilter(SynonymFilterFactory.class,synonymMap)
.build();
its giving me error like this
Exception in thread "main" java.io.IOException: Resource not found: Facebook
at org.apache.lucene.analysis.util.ClasspathResourceLoader.openResource(ClasspathResourceLoader.java:67)
Thanks in Advance..

The path passed in to builder is not the index directory, but where it should look for configuration resources for this analyzer. The second argument of addTokenFilter is list of parameters, not a synonym mapping.
What you will want to do, is put your list of synonyms into a file in the directory mentioned above, and pass in that file name as the "synonym" parameter in your addTokenFilter call (as well as any other parameters you may need).

Related

How to automate function according to Array list size

I'm sorry this question header is not 100% correct. Because of that, I'll explain my scenario here.
I created a function to merge 4 data sets into one return format. Because that's the format front-end side needed. So this is working fine now.
public ReturnFormat makeThribleLineChart(List<NameCountModel> totalCount, List<NameCountModel>,p1Count, List<NameCountModel> p2Count, List<NameCountModel> average) {
ReturnFormat returnFormat = new ReturnFormat(null,null);
try {
String[] totalData = new String[totalCount.size()];
String[] p1Data = new String[p1Count.size()];
String[] p2Data = new String[p2Count.size()];
String[] averageData = new String[p2Count.size()];
String[] lableList = new String[totalCount.size()];
for (int x = 0; x < totalCount.size(); x++) {
totalData[x] = totalCount.get(x).getCount();
p1Data[x] = p1Count.get(x).getCount();
p2Data[x] = p2Count.get(x).getCount();
averageData[x] = average.get(x).getCount();
lableList[x] = totalCount.get(x).getName();
}
FormatHelper<String[]> totalFormatHelper= new FormatHelper<String[]>();
totalFormatHelper.setData(totalData);
totalFormatHelper.setType("line");
totalFormatHelper.setLabel("Uudet");
totalFormatHelper.setyAxisID("y-axis-1");
FormatHelper<String[]> p1FormatHelper= new FormatHelper<String[]>();
p1FormatHelper.setData(p1Data);
p1FormatHelper.setType("line");
p1FormatHelper.setLabel("P1 päivystykseen heti");
FormatHelper<String[]> p2FormatHelper= new FormatHelper<String[]>();
p2FormatHelper.setData(p2Data);
p2FormatHelper.setType("line");
p2FormatHelper.setLabel("P2 päivystykseen muttei yöllä");
FormatHelper<String[]> averageFormatHelper= new FormatHelper<String[]>();
averageFormatHelper.setData(averageData);
averageFormatHelper.setType("line");
averageFormatHelper.setLabel("Jonotusaika keskiarvo");
averageFormatHelper.setyAxisID("y-axis-2");
List<FormatHelper<String[]>> formatHelpObj = new ArrayList<FormatHelper<String[]>>();
formatHelpObj.add(totalFormatHelper);
formatHelpObj.add(p1FormatHelper);
formatHelpObj.add(p2FormatHelper);
formatHelpObj.add(averageFormatHelper);
returnFormat.setData(formatHelpObj);
returnFormat.setLabels(lableList);
returnFormat.setMessage(Messages.Success);
returnFormat.setStatus(ReturnFormat.Status.SUCCESS);
} catch (Exception e) {
returnFormat.setData(null);
returnFormat.setMessage(Messages.InternalServerError);
returnFormat.setStatus(ReturnFormat.Status.ERROR);
}
return returnFormat;
}
so, as you can see here, all the formatting is hardcoded. So my question is how to automate this code for list count. Let's assume next time I have to create chart formatting for five datasets. So I have to create another function to it. That's the thing I want to reduce. So I hope you can understand my question.
Thank you.
You're trying to solve the more general problem of composing a result object (in this case ReturnFormat) based on dynamic information. In addition, there's some metadata being setup along with each dataset - the type, label, etc. In the example that you've posted, you've hardcoded the relationship between a dataset and this metadata, but you'd need some way to establish this relationship for data dynamically if you have a variable number of parameters here.
Therefore, you have a couple of options:
Make makeThribleLineChart a varargs method to accept a variable number of parameters representing your data. Now you have the problem of associating metadata with your parameters - best option is probably to wrap the data and metadata together in some new object that is provided as each param of makeThribleLineChart.
So you'll end up with a signature that looks a bit like ReturnFormat makeThribleLineChart(DataMetadataWrapper... allDatasets), where DataMetadataWrapper contains everything required to build one FormatHelper instance.
Use a builder pattern, similar to the collection builders in guava, for example something like so:
class ThribbleLineChartBuilder {
List<FormatHelper<String[]>> formatHelpObj = new ArrayList<>();
ThribbleLineChartBuilder addDataSet(String describeType, String label, String yAxisId, List<NameCountModel> data) {
String[] dataArray = ... ; // build your array of data
FormatHelper<String[]> formatHelper = new FormatHelper<String[]>();
formatHelper.setData(dataArray);
formatHelper.setType(describeType);
... // set any other parameters that the FormatHelper requires here
formatHelpObj.add(formatHelper);
return this;
}
ReturnFormat build() {
ReturnFormat returnFormat = new ReturnFormat(null, null);
returnFormat.setData(this.formatHelpObj);
... // setup any other fields you need in ReturnFormat
return returnFormat;
}
}
// usage:
new ThribbleLineChartBuilder()
.addDataSet("line", "Uudet", "y-axis-1", totalCount)
.addDataSet("line", "P1 päivystykseen heti", null, p1Count)
... // setup your other data sources
.build()

alfresco buildonly indexer for searching the properties created on the fly

I am using the latest version of alfresco 5.1 version.
one of my requirement is to create properties (key / value) where user enter the key as well as the value.
so I have done that like this
Map<QName, Serializable> props = new HashMap<QName, Serializable>();
props.put(QName.createQName("customProp1"), "prop1");
props.put(QName.createQName("customProp2"), "prop2");
ChildAssociationRef associationRef = nodeService.createNode(nodeService.getRootNode(storeRef), ContentModel.ASSOC_CHILDREN, QName.createQName(GUID.generate()), ContentModel.TYPE_CMOBJECT, props);
Now what I want to do is search the nodes with these newly created properties. I was able to search the newly created property like this.
public List<NodeRef> findNodes() throws Exception {
authenticate("admin", "admin");
StoreRef storeRef = new StoreRef(StoreRef.PROTOCOL_WORKSPACE, "SpacesStore");
List<NodeRef> nodeList = null;
Map<QName, Serializable> props = new HashMap<QName, Serializable>();
props.put(QName.createQName("customProp1"), "prop1");
props.put(QName.createQName("customProp2"), "prop2");
ChildAssociationRef associationRef = nodeService.createNode(nodeService.getRootNode(storeRef), ContentModel.ASSOC_CHILDREN, QName.createQName(GUID.generate()), ContentModel.TYPE_CMOBJECT, props);
NodeRef nodeRef = associationRef.getChildRef();
String query = "#cm\\:customProp1:\"prop1\"";
SearchParameters sp = new SearchParameters();
sp.addStore(storeRef);
sp.setLanguage(SearchService.LANGUAGE_LUCENE);
sp.setQuery(query);
try {
ResultSet results = serviceRegistry.getSearchService().query(sp);
nodeList = new ArrayList<NodeRef>();
for (ResultSetRow row : results) {
nodeList.add(row.getNodeRef());
System.out.println(row.getNodeRef());
}
System.out.println(nodeList.size());
} catch (Exception e) {
e.printStackTrace();
}
return nodeList;
}
The alfresco-global.properties indexer configuration is
index.subsystem.name=buildonly
index.recovery.mode=AUTO
dir.keystore=${dir.root}/keystore
Now my question is
Is it possible to achieve the same using the solr4 indexer ?
Or Is there any way to use buildonly indexer for a particular query ?
In your query
String query = "#cm\\:customProp1:\"prop1\"";
remove cm as you are building the QName on the fly so it does not come under cm i.e. (ContentModel) properties. So your query will be
String query = "#\\:customProp1:\"prop1\"";
Hope this will work for you
First, double check if you're simply experiencing eventual consistency, as described below. If you are, and if this presents a problem for you, consider switching to CMIS queries while staying on SOLR.
http://docs.alfresco.com/5.1/concepts/solr-event-consistency.html
Other than this, check if the node has been indexed at all. If it has, take a closer look at how you build your query.
How to find List of unindexed file in alfresco

Fetch all facets records against specfic field Lucene

I'm beginner and started learning Lucene. Currently I 've implemented a Count Facets program in lucene 6.0.2 which display or output all facets field count. But now, want to search for a city "California" and in the result it will show all facets count w.r.t this query, How to do this...
Here is the code:
public List<FacetResult> runSearch() throws IOException {
DirectoryReader indexReader = DirectoryReader.open(indexDir);
IndexSearcher searcher = new IndexSearcher(indexReader);
TaxonomyReader taxoReader = new DirectoryTaxonomyReader(taxoDir);
FacetsConfig config = new FacetsConfig();
FacetsCollector fc = new FacetsCollector();
FacetsCollector.search(searcher, new MatchAllDocsQuery(), 10, fc);
List<FacetResult> results = new ArrayList<>();
Facets facets = new FastTaxonomyFacetCounts(taxoReader, config, fc);
results.add(facets.getTopChildren(10, "city"));
results.add(facets.getTopChildren(10, "make"));
results.add(facets.getTopChildren(10, "year"));
results.add(facets.getTopChildren(10, "model"));
indexReader.close();
taxoReader.close();
return results;
}
From your example I get that you've searched * in your first search. To search for "* AND city:california" (Pretty useless in this case, but works as an example) you can build a DrillDownQuery with your original query and the filter, e.g.
Query baseQuery = new MatchAllDocsQuery();
DrillDownQuery ddQuery = new DrillDownQuery(config, baseQuery);
ddQuery.add("city", "california");
FacetsCollector fc = new FacetsCollector();
FacetsCollector.search(searcher, ddQuery, 10, fc);
Add can be called multiple times. If you call it for the same field the value is appended with OR, if you call it for a different field those limitation is appended with AND, e.g. (* AND (city:california OR city:york) AND year:1980)

How do I normalize a CSV file with Encog?

I need to normalize a CSV file. I followed this article written by Jeff Heaton. This is (some) of my code:
File sourceFile = new File("Book1.csv");
File targetFile = new File("Book1_norm.csv");
EncogAnalyst analyst = new EncogAnalyst();
AnalystWizard wizard = new AnalystWizard(analyst);
wizard.wizard(sourceFile, true, AnalystFileFormat.DECPNT_COMMA);
final AnalystNormalizeCSV norm = new AnalystNormalizeCSV();
norm.analyze(sourceFile, false, CSVFormat.ENGLISH, analyst);
norm.setProduceOutputHeaders(false);
norm.normalize(targetFile);
The only difference between my code and the one of the article is this line:
norm.setOutputFormat(CSVFormat.ENGLISH);
I tried to use it but it seems that in Encog 3.1.0, that method doesn't exist. The error I get is this one (it looks like the problem is with the line norm.normalize(targetFile):
Exception in thread "main" org.encog.app.analyst.AnalystError: Can't find column: 11700
at org.encog.app.analyst.util.CSVHeaders.find(CSVHeaders.java:187)
at org.encog.app.analyst.csv.normalize.AnalystNormalizeCSV.extractFields(AnalystNormalizeCSV.java:77)
at org.encog.app.analyst.csv.normalize.AnalystNormalizeCSV.normalize(AnalystNormalizeCSV.java:192)
at IEinSoftware.main(IEinSoftware.java:55)
I added a FAQ that shows how to normalize a CSV file. http://www.heatonresearch.com/faq/4/2
Here's a function to do it... of course you need to create an analyst
private EncogAnalyst _analyst;
public void NormalizeFile(FileInfo SourceDataFile, FileInfo NormalizedDataFile)
{
var wizard = new AnalystWizard(_analyst);
wizard.Wizard(SourceDataFile, _useHeaders, AnalystFileFormat.DecpntComma);
var norm = new AnalystNormalizeCSV();
norm.Analyze(SourceDataFile, _useHeaders, CSVFormat.English, _analyst);
norm.ProduceOutputHeaders = _useHeaders;
norm.Normalize(NormalizedDataFile);
}

Lucene : Changing the default facet delimiter?

First post on this wonderful site!
My goal is to use hierarchical facets for searching an index using Lucene. However, my facets need to be delimited by a character other than '/', (in this case, '~'). Example:
Categories
Categories~Category1
Categories~Category2
I have created a class that implements FacetIndexingParams interface (a copy of DefaultFacetIndexingParams with the DEFAULT_FACET_DELIM_CHAR param set to '~').
Paraphrased indexing code : (using FSDirectory for both index and taxonomy)
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_34)
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_34, analyzer)
IndexWriter writer = new IndexWriter(indexDir, config)
TaxonomyWriter taxo = new LuceneTaxonomyWriter(taxDir, OpenMode.CREATE)
Document doc = new Document()
// Add bunch of Fields... hidden for the sake of brevity
List<CategoryPath> categories = new ArrayList<CategoryPath>()
row.tags.split('\\|').each{ tag ->
def cp = new CategoryPath()
tag.split('~').each{
cp.add(it)
}
categories.add(cp)
}
NewFacetIndexingParams facetIndexingParams = new NewFacetIndexingParams()
DocumentBuilder categoryDocBuilder = new CategoryDocumentBuilder(taxo, facetIndexingParams)
categoryDocBuilder.setCategoryPaths(categories).build(doc)
writer.addDocument(doc)
// Commit and close both writer and taxo.
Search code paraphrased:
// Create index and taxonomoy readers to get info from index and taxonomy
IndexReader indexReader = IndexReader.open(indexDir)
TaxonomyReader taxo = new LuceneTaxonomyReader(taxDir)
Searcher searcher = new IndexSearcher(indexReader)
QueryParser parser = new QueryParser(Version.LUCENE_34, "content", new StandardAnalyzer(Version.LUCENE_34))
parser.setAllowLeadingWildcard(true)
Query q = parser.parse(query)
TopScoreDocCollector tdc = TopScoreDocCollector.create(10, true)
List<FacetResult> res = null
NewFacetIndexingParams facetIndexingParams = new NewFacetIndexingParams()
FacetSearchParams facetSearchParams = new FacetSearchParams(facetIndexingParams)
CountFacetRequest cfr = new CountFacetRequest(new CategoryPath(""), 99)
cfr.setDepth(2)
cfr.setSortBy(SortBy.VALUE)
facetSearchParams.addFacetRequest(cfr)
FacetsCollector facetsCollector = new FacetsCollector(facetSearchParams, indexReader, taxo)
def cp = new CategoryPath("Category~Category1", (char)'~')
searcher.search(DrillDown.query(q, cp), MultiCollector.wrap(tdc, facetsCollector))
The results always return a list of facets in the form of "Category/Category1".
I have used the Luke tool to look at the index and it appears the facets are being delimited by the '~' character in the index.
What is the best route to do this? Any help is greatly appreciated!
I have figured out the issue. The search and indexing are working as they are supposed to. It is how I have been getting the facet results that is the issue. I was using :
res = facetsCollector.getFacetResults()
res.each{ result ->
result.getFacetResultNode().getLabel().toString()
}
What I needed to use was :
res = facetsCollector.getFacetResults()
res.each{ result ->
result.getFacetResultNode().getLabel().toString((char)'~')
}
The difference being the paramter sent to the toString function!
Easy to overlook, tough to find.
Hope this helps others.

Categories