ForEach step on Gremlin - java

I have a neo4j query like:
...
"WITH DISTINCT k " +
// classic for each loop for the new rankings information
"FOREACH (app in $apps | " +
// upsert the app
" MERGE (a:App{appId:app.appId}) " +
...
// end of loop
") " +
I'm using gremlin-java. In here, I want to give $apps as custom parameter. I've just checked gremlin document but I couldn't find a foreach step. Is there a suggestion?
graph.foreach(apps: map)...
Solved with:
...constant($apps).unfold().as(app)...

As you noted you can use a constant step to inject a value into a query. However you can also use the inject step to insert a collection of values in a similar way. Here are a couple of simple examples - you can extend these patterns to include id, label and multiple property values as needed.
gremlin> g.inject([[id:1],[id:2],[id:3],[id:4]]).
unfold().as('a').
addV('test').
property('SpecialId',select('a').select('id'))
==>v[61367]
==>v[61369]
==>v[61371]
==>v[61373]
gremlin> g.V().hasLabel('test').valueMap(true)
==>[id:61367,label:test,SpecialId:[1]]
==>[id:61369,label:test,SpecialId:[2]]
==>[id:61371,label:test,SpecialId:[3]]
==>[id:61373,label:test,SpecialId:[4]]
gremlin> g.inject(1,2,3,4).as('a').
addV('test2').
property('SpecialId',select('a'))
==>v[61375]
==>v[61377]
==>v[61379]
==>v[61381]
gremlin> g.V().hasLabel('test2').valueMap(true)
==>[id:61375,label:test2,SpecialId:[1]]
==>[id:61377,label:test2,SpecialId:[2]]
==>[id:61379,label:test2,SpecialId:[3]]
==>[id:61381,label:test2,SpecialId:[4]]
gremlin>
The first query injects a list of maps. The second a simple list. This is a bit like the UNWIND pattern you may be used to in Cypher and it works in a similar way.

Related

JPA Select query not returning results with one letter word

I have a query that when given a word that starts with a one-letter word followed by space character and then another word (ex: "T Distribution"), does not return results. While given "Distribution" alone returns results including the results for "T Distribution". It is the same behavior with all search terms beginning with a one-letter word followed by space character and then another word.
The problem appears when the search term is of this pattern:
"[one-letter][space][letter/word]". example: "o ring".
What would be the problem that the LIKE operator not working correctly in this case?
Here is my query:
#Cacheable(value = "filteredConcept")
#Query("SELECT NEW sina.backend.data.model.ConceptSummaryVer04(s.id, s.arabicGloss, s.englishGloss, s.example, s.dataSourceId,
s.synsetFrequnecy, s.arabicWordsCache, s.englishWordsCache, s.superId, s.categoryId, s.dataSourceCacheAr, s.dataSourceCacheEn,
s.superTypeCasheAr, s.superTypeCasheEn, s.area, s.era, s.rank, s.undiacritizedArabicWordsCache, s.normalizedEnglishWordsCache,
s.isTranslation, s.isGloss, s.arabicSynonymsCount, s.englishSynonymsCount) FROM Concept s
where s.undiacritizedArabicWordsCache LIKE %:searchTerm% AND data_source_id != 200 AND data_source_id != 31")
List<ConceptSummaryVer04> findByArabicWordsCacheAndNotConcept(#Param("searchTerm") String searchTerm, Sort sort);
the result of the query on the database itself:
link to screenshot
results on the database are returned no matter the letters case:
link to screenshot
I solved this problem.
It was due to the default configuration of the Full-text index on mysql database which is by default set to 2 (ft_min_word_len = 2).
I changed that and rebuilt the index. Then, one-letter words were returned by the query.
12.9.6 Fine-Tuning MySQL Full-Text Search
Use some quotes:
LIKE '%:searchTerm%';
Set searchTerm="%your_word%" and use it on query like this :
... s.undiacritizedArabicWordsCache LIKE :searchTerm ...

Aggregating within Apache Jena

I'm using the Java API of Apache Jena to store and retrieve documents and the words within them. For this I decided to set up the following datastructure:
_dataset = TDBFactory.createDataset("./database");
_dataset.begin(ReadWrite.WRITE);
Model model = _dataset.getDefaultModel();
Resource document= model.createResource("http://name.space/Source/DocumentA");
document.addProperty(RDF.value, "Document A");
Resource word = model.createResource("http://name.space/Word/aword");
word.addProperty(RDF.value, "aword");
Resource resource = model.createResource();
resource.addProperty(RDF.value, word);
resource.addProperty(RSS.items, "5");
document.addProperty(RDF.type, resource);
_dataset.commit();
_dataset.end();
The code example above represents a document ("Document A") consisting of five (5) words ("aword"). The occurences of a word in a document are counted and stored as a property. A word can also occur in other documents, therefore the occurence count relating to a specific word in a specific document is linked together by a blank node. (I'm not entirely sure if this structure makes any sense as I'm fairly new to this way of storing information, so please feel free to provide better solutions!)
My major question is: How can I get a list of all distinct words and the sum of their occurences over all documents?
Your data model is a bit unconventional, in my opinion. With your code, you'll end up with data that looks like this (in Turtle notation), and which uses rdf:type and rdf:value in unconventional ways:
:doc rdf:value "document a" ;
rdf:type :resource .
:resource rdf:value :word ;
:items 5 .
:word rdf:value "aword" .
It's unusual, because usually you wouldn't have such complex information on the type attribute of a resource. From the SPARQL standpoint though, rdf:type and rdf:value are properties just like any other, and you can still retrieve the information you're looking for with a simple query. It would look more or less like this (though you'll need to define some prefixes, etc.):
select ?word (sum(?n) as ?nn) where {
?document rdf:type ?type .
?type rdf:value/rdf:value ?word ;
:items ?n .
}
group by ?word
That query will produce a result for each word, and with each will be the sum of all the values of the :items properties associated with the word. There are lots of questions on Stack Overflow that have examples of running SPARQL queries with Jena. E.g., (the first one that I found with Google): Query Jena TDB store.

How to retrieve the Field that "hit" in Lucene

Maybe I'm really missing something.
I have indexed a bunch of key/value pairs in Lucene (v4.1 if it matters). Say I have
key1=value1 and key2=value2, e.g. as read from a properties file.
They get indexed both as specific fields and into a catchall "ALL" field, e.g.
new Field("key1", "value1", aFieldTypeMimickingKeywords);
new Field("key2", "value2", aFieldTypeMimickingKeywords);
new Field("ALL", "key1=value1", aFieldTypeMimickingKeywords);
new Field("ALL", "key2=value2", aFieldTypeMimickingKeywords);
// then get added to the Document of course...
I can then do a wildcard search, using
new WildcardQuery(new Term("ALL", "*alue1"));
and it will find the hit.
But, it would be nice to get more info, like "what was complete value (e.g. "key1=value1") that goes with that hit?".
The best I can figure out it to get the Document, then get the list of IndexableFields, then loop over all of them and see if the field.stringValue().contains("alue1"). (I can look at the data structures in the debugger and all the info is there)
This seems completely insane cause isn't that what Lucene just did? Shouldn't the Hit information return some of the Fields?
Is Lucene missing what seems like "obvious" functionality? Google and starting at the APIs hasn't revealed anything straightforward, but I feel like I must be searching on the wrong stuff.
You might want to try with IndexSearcher.explain() method. Once you get the ID of the matching document, prepare a query for each field (using the same search keywords) and invoke Explanation.isMatch() for each query: the ones that yield true will give you the matched field. Example:
for (String field: fields){
Query query = new WildcardQuery(new Term(field, "*alue1"));
Explanation ex = searcher.explain(query, docID);
if (ex.isMatch()){
//Your query matched field
}
}

Find out which field matched term in custom score script

I am using a custom score query with a multiMatchQuery. Ultimately what I want is simple and requires little explaination. In my Java Custom Score Script, I want to be able to find out which field a result matched to.
Example:
If I search Starbucks and a result comes back with the name Starbucks then I want to be able to know that name.basic was the field that matched my query. If I search for coffee and starbucks comes back I want to be able to know that tags was the field that matched.
Is there anyway to do this?
Search Query Code:
def basicSearchableSearch(t: String, lat: Double, lon: Double, r: Double, z: Int, bb: BoundingBox, max: Int): SearchResponse = {
val multiQuery = filteredQuery(
multiMatchQuery(t)
//Matches businesses and POIs
.field("name.basic").operator(Operator.OR)
.field("name.no_space")
//Businesses only
.field("tags").boost(6f),
geoBoundingBoxFilter("location")
.bottomRight(bb.botRight.y,bb.botRight.x)
.topLeft(bb.topLeft.y,bb.topLeft.x)
)
val customQuery = customScoreQuery(
multiQuery
)
.script("customJavaScript")
.lang("native")
.param("lat",lat)
.param("lon",lon)
.param("zoom",z)
global.Global.getClient().prepareSearch("searchable")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(customQuery)
.setFrom(0).setSize(max)
.execute()
.actionGet();
}
It's only simple for simple queries. On complex queries, the question which field matched is actually quite nontrivial. So, I cannot think of any efficient way to do it.
Perhaps, you could consider moving your custom score calculation closer to the match. The multi_match query is basically a shortcut for a set of match queries on the same query string combined by a dis_max query. So, you are currently building something like this:
custom_score(
filtered(
dis_max(match_1, match_2, match_3)
)
)
What you can do is to move your custom_score under dis_max and build something like this:
filtered(
dis_max(
custom_score_1(match_1),
custom_score_2(match_2),
custom_score_3(match_3)
)
)
Obviously, this will be a somewhat different query, since dis_max will operate on custom score instead of original score.

Faceting using SolrJ and Solr4

I've gone through the related questions on this site but haven't found a relevant solution.
When querying my Solr4 index using an HTTP request of the form
&facet=true&facet.field=country
The response contains all the different countries along with counts per country.
How can I get this information using SolrJ?
I have tried the following but it only returns total counts across all countries, not per country:
solrQuery.setFacet(true);
solrQuery.addFacetField("country");
The following does seem to work, but I do not want to have to explicitly set all the groupings beforehand:
solrQuery.addFacetQuery("country:usa");
solrQuery.addFacetQuery("country:canada");
Secondly, I'm not sure how to extract the facet data from the QueryResponse object.
So two questions:
1) Using SolrJ how can I facet on a field and return the groupings without explicitly specifying the groups?
2) Using SolrJ how can I extract the facet data from the QueryResponse object?
Thanks.
Update:
I also tried something similar to Sergey's response (below).
List<FacetField> ffList = resp.getFacetFields();
log.info("size of ffList:" + ffList.size());
for(FacetField ff : ffList){
String ffname = ff.getName();
int ffcount = ff.getValueCount();
log.info("ffname:" + ffname + "|ffcount:" + ffcount);
}
The above code shows ffList with size=1 and the loop goes through 1 iteration. In the output ffname="country" and ffcount is the total number of rows that match the original query.
There is no per-country breakdown here.
I should mention that on the same solrQuery object I am also calling addField and addFilterQuery. Not sure if this impacts faceting:
solrQuery.addField("user-name");
solrQuery.addField("user-bio");
solrQuery.addField("country");
solrQuery.addFilterQuery("user-bio:" + "(Apple OR Google OR Facebook)");
Update 2:
I think I got it, again based on what Sergey said below. I extracted the List object using FacetField.getValues().
List<FacetField> fflist = resp.getFacetFields();
for(FacetField ff : fflist){
String ffname = ff.getName();
int ffcount = ff.getValueCount();
List<Count> counts = ff.getValues();
for(Count c : counts){
String facetLabel = c.getName();
long facetCount = c.getCount();
}
}
In the above code the label variable matches each facet group and count is the corresponding count for that grouping.
Actually you need only to set facet field and facet will be activated (check SolrJ source code):
solrQuery.addFacetField("country");
Where did you look for facet information? It must be in QueryResponse.getFacetFields (getValues.getCount)
In the solr Response you should use QueryResponse.getFacetFields() to get List of FacetFields among which figure "country". so "country" is idenditfied by QueryResponse.getFacetFields().get(0)
you iterate then over it to get List of Count objects using
QueryResponse.getFacetFields().get(0).getValues().get(i)
and get value name of facet using QueryResponse.getFacetFields().get(0).getValues().get(i).getName()
and the corresponding weight using
QueryResponse.getFacetFields().get(0).getValues().get(i).getCount()

Categories