I have a string that might eventually contain spaces. I would like to replace those spaces with a regex that matches against \t,\r,\n. After replacing I would like to call regexp_like (an oracle function) to match a field against this string.
I know it is possible to call db functions using criteria builder as described for example in this link
I am not very familiar with the difference between regex in java versus oracle or how to cobble this together (I have never called functions from criteriabuilder). Here are my tentative steps with places where I am stuck in the comments
// first replace all spaces with regex for \s,\r,\t, value is the original string
value.replaceAll(" +", "[\\t\\n\\r]+")
// build the db function call expression, seems I cant do table.<String>get(field) and cant pass value as a string
Expression<String> regExp = cb.function("regexp_like", String.class, table.<String>get(field), value);
// now create a predicate not sure how
Predicate fieldMatch = cb.equal(...)
Is this possible?
Its possible. You only need to do few small changes.
Extends your Oracle dialog
public class Oracle10gCustomDialect extends Oracle10gDialect {
public Oracle10gCustomDialect() {
super();
registerFunction("regexp_like", new SQLFunctionTemplate(StandardBasicTypes.BOOLEAN,
"(case when (regexp_like(?1, ?2)) then 1 else 0 end)"));
}
}
Then use this path at your application.properties or hibernate properties config.
Then in your specification, need to do something like this.
Expression<Boolean> regExprLike = criteriaBuilder.function("regexp_like", Boolean.class, root.get("yourColumn"), criteriaBuilder.literal("<your regexp value>"));
predicates.add(criteriaBuilder.isTrue(regExprLike));
...
And thats all!
Related
I have a query that when given a word that starts with a one-letter word followed by space character and then another word (ex: "T Distribution"), does not return results. While given "Distribution" alone returns results including the results for "T Distribution". It is the same behavior with all search terms beginning with a one-letter word followed by space character and then another word.
The problem appears when the search term is of this pattern:
"[one-letter][space][letter/word]". example: "o ring".
What would be the problem that the LIKE operator not working correctly in this case?
Here is my query:
#Cacheable(value = "filteredConcept")
#Query("SELECT NEW sina.backend.data.model.ConceptSummaryVer04(s.id, s.arabicGloss, s.englishGloss, s.example, s.dataSourceId,
s.synsetFrequnecy, s.arabicWordsCache, s.englishWordsCache, s.superId, s.categoryId, s.dataSourceCacheAr, s.dataSourceCacheEn,
s.superTypeCasheAr, s.superTypeCasheEn, s.area, s.era, s.rank, s.undiacritizedArabicWordsCache, s.normalizedEnglishWordsCache,
s.isTranslation, s.isGloss, s.arabicSynonymsCount, s.englishSynonymsCount) FROM Concept s
where s.undiacritizedArabicWordsCache LIKE %:searchTerm% AND data_source_id != 200 AND data_source_id != 31")
List<ConceptSummaryVer04> findByArabicWordsCacheAndNotConcept(#Param("searchTerm") String searchTerm, Sort sort);
the result of the query on the database itself:
link to screenshot
results on the database are returned no matter the letters case:
link to screenshot
I solved this problem.
It was due to the default configuration of the Full-text index on mysql database which is by default set to 2 (ft_min_word_len = 2).
I changed that and rebuilt the index. Then, one-letter words were returned by the query.
12.9.6 Fine-Tuning MySQL Full-Text Search
Use some quotes:
LIKE '%:searchTerm%';
Set searchTerm="%your_word%" and use it on query like this :
... s.undiacritizedArabicWordsCache LIKE :searchTerm ...
I have following hql query,
from Channe where ip='1.11.6.0';
But in the db the IP is saving as 1.11.6.0:8080 .
So I need to modify the query in a way that, split the ip with a delimiter ':' and take the firstcome value. I do not wish to modify the search with value 1.11.6.0:8080.
See this page in the Hibernate docs. On the page below there is a section called 14.10. Expressions
http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/queryhql.html
It says, among other things:
string concatenation ...||... or concat(...,...) current_date(),
...
Any function or operator defined by EJB-QL 3.0: substring(), trim(), lower(), upper(),
length(), locate(), abs(), sqrt(), bit_length(), mod()
But you are actually better off doing as #Hansraj suggests in the comments and appending a wildcard to your search term
String query = "from Channe where ip like :term";
entityManager.createQuery(query).setParameter("term",ipString + "%");
This assumes that your data type is string, of course.
Try the following:
Say variable ip had the address
ip = "10.131.56.40:8080";
var ipSplit = ip.Split(':');
var ipStart = ipSplit[0];
ipStart will store only 10.131.56.40
This could solve your problem
Try this:
SPLIT(".", FIELDNAME)
I figured out that of course . and SPACE aren't allowed. Are there other forbidden characters ?
You can use any (UTF8) character in the field name which aren't
special (contains ".", or starts with "$").
https://jira.mongodb.org/browse/SERVER-3229
https://stackoverflow.com/a/7976235/311220
It's generally best to stick with lowercase alphanumeric with underscores though.
Something else to look out for is the fact that you can make a property name called "query" but then use query operators on it, making it awkward to do a large number of queries.
Example:
Insert document with a property named
db.coll.insert({ query: 'foo' });
Equality query works:
db.coll.findOne({ query: 'foo' });
Not equal ($ne) does not:
db.coll.findOne({ query: { $ne: 'bar' } });
Im new to this so here goes.
Trying to get a user called "Bob" from the MongoDb.
I have the:
UserData ud = MonConMan.instance().getDb().find(UserData.class, "name","bob").get();
The "bob" cannot be found if it has capital "Bob".
I understand i can get a List and do equalsIgnoreCase but are
there some Operators i can use?
I have users logging on and must test to see if they are registered. A user can type his name anyway he likes so must find a way to equalsIgnoreCase. Yea this is a problem, i cannot get all names and do equalsIgnoreCase, if there are like 10,000. One could of course initially save all user names in lowercase but that would destroy the visual appearance of the name.
looking at the wiki but cannot see any..
http://code.google.com/p/morphia/wiki/Query
Use java regex, like this.
String name = "bob";
Pattern pattern = Pattern.compile("^" + bob + "$", Pattern.CASE_INSENSITIVE);//This line will create a pattern to match words starts with "b", ends with "b" and its case insensitive too.
Query<UserData> query = createQuery().field("name").equal(pattern).retrievedFields(true, "id");//Replace `id` with what ever name you use in UserData for '_id'
UserData user = query.get();
if(user!=null){
//he is already registered
}
else{
//He is a new guy
}
(I am not good at regex, so you may have read about$&^somewhere. )
You should be sure that the user names you are using to validate a new user should be unique across your system.
Ended up keeping two fields like
- lowercaseusername
- originalusername
This way i could search for a user using the lowercaseusername
You can make find a name of a UserData using this code :
Query<UserData> query = createQuery().filter("name","bob");
find(query);
In my application, this code return all UserData that haves a field name with "bob" value.
The code can be this way too :
Query<UserData> query = createQuery().field("name").equal("bob");
find(query);
These codes will be in a UserDataDao that extends BasicDao, and receives in the construtor the datastore from morphia.
Using Hibernate Search Annotations (mostly just #Field(index = Index.TOKENIZED)) I've indexed a number of fields related to a persisted class of mine called Compound. I've setup text search over all the indexed fields, using the MultiFieldQueryParser, which has so far worked fine.
Among the fields indexed and searchable is a field called compoundName, with sample values:
3-Hydroxyflavone
6,4'-Dihydroxyflavone
When I search for either of these values in full the related Compound instances are returned. However problems occur when I use the partial name and introduce wildcards:
searching for 3-Hydroxyflav* still gives the correct hit, but
searching for 6,4'-Dihydroxyflav* fails to find anything.
Now as I'm quite new to Lucene / Hibernate-search, I'm not quite sure where to look at this point.. I think it might have something to do with the ' present in the second query, but I don't know how to proceed.. Should I look into Tokenizers / Analyzers / QueryParsers or something else entirely?
Or can anyone tell me how I can get the second wildcard search to match, preferably without breaking the MultiField-search behavior?
I'm using Hibernate-Search 3.1.0.GA & Lucene-core 2.9.3.
Some relevant code bits to illustrate my current approach:
Relevant parts of the indexed Compound class:
#Entity
#Indexed
#Data
#EqualsAndHashCode(callSuper = false, of = { "inchikey" })
public class Compound extends DomainObject {
#NaturalId
#NotEmpty
#Length(max = 30)
#Field(index = Index.TOKENIZED)
private String inchikey;
#ManyToOne
#IndexedEmbedded
private ChemicalClass chemicalClass;
#Field(index = Index.TOKENIZED)
private String commonName;
...
}
How I currently search over the indexed fields:
String[] searchfields = Compound.getSearchfields();
MultiFieldQueryParser parser =
new MultiFieldQueryParser(Version.LUCENE_29, searchfields, new StandardAnalyzer(Version.LUCENE_29));
FullTextSession fullTextSession = Search.getFullTextSession(getSession());
FullTextQuery fullTextQuery =
fullTextSession.createFullTextQuery(parser.parse("searchterms"), Compound.class);
List<Compound> hits = fullTextQuery.list();
Use WhitespaceAnalyzer instead of StandardAnalyzer. It will just split at whitespace, and not at commas, hyphens etc. (It will not lowercase them though, so you will need to build your own chain of whitespace + lowercase, assuming you want your search to be case-insensitive). If you need to do things differently for different fields, you can use a PerFieldAnalyzer.
You can't just set it to un-tokenized, because that will interpret your entire body of text as one token.
I think your problem is a combination of analyzer and query language problems. It is hard to say what exactly causes the problem. To find this out I recommend you inspect you index using the Lucene index tool Luke.
Since in your Hibernate Search configuration you are not using a custom analyzer the default - StandardAnalyzer - is used. This would be consistent with the fact that you use StandardAnalyzer in the constructor of MultiFieldQueryParser (always use the same analyzer for indexing and searching!). What I am not so sure of is how "6,4'-Dihydroxyflavone" gets tokenized by StandardAnalyzer. That the first thing you have to find out. For example the javadoc says:
Splits words at hyphens, unless
there's a number in the token, in
which case the whole token is
interpreted as a product number and is
not split.
It might be that you need to write your own analyzer which tokenizes your chemical names the way you need it for your use cases.
Next the query parser. Make sure you understand the query syntax - Lucene query syntax. Some characters have special meaning, for example a '-'. It could be that your query is parsed the wrong way.
Either way, first step os to find out how your chemical names get tokenized. Hope that helps.
I wrote my own analyzer:
import java.util.Set;
import java.util.regex.Pattern;
import org.apache.lucene.index.memory.PatternAnalyzer;
import org.apache.lucene.util.Version;
public class ChemicalNameAnalyzer extends PatternAnalyzer {
private static Version version = Version.LUCENE_29;
private static Pattern pattern = compilePattern();
private static boolean toLowerCase = true;
private static Set stopWords = null;
public ChemicalNameAnalyzer(){
super(version, pattern, toLowerCase, stopWords);
}
public static Pattern compilePattern() {
StringBuilder sb = new StringBuilder();
sb.append("(-{0,1}\\(-{0,1})");//Matches an optional dash followed by an opening round bracket followed by an optional dash
sb.append("|");//"OR" (regex alternation)
sb.append("(-{0,1}\\)-{0,1})");
sb.append("|");//"OR" (regex alternation)
sb.append("((?<=([a-zA-Z]{2,}))-(?=([^a-zA-Z])))");//Matches a dash ("-") preceded by two or more letters and succeeded by a non-letter
return Pattern.compile(sb.toString());
}
}