Accent insensitive in CriteriaBuilder query - java

Well, I'm using CriteriaBuilder and PredicateList to work with JPA and extract information from the database, the fact is that it contains data with accents and I need to do searches that even though I search without accents I find the words with accents too.
For example:
The database contains the following data:
'técnico a' means
'tecnico b'.
In the clause where I enter'tec' and I need both options to appear. How can I make him skip the accents? I need that the result will be:
tecnico a,
tecnico b
I have this code:
predicatesList.add((builder.like(builder.lower(root.<String>get("descripcion")),
'%' + (descripcion.toLowerCase().trim() + '%'))));
thank you in advance.

The point is that character 'é' and 'e' are different. So you can not find both if searching by one of them. Here is an idea that I just thought of. Make your table to hold two fields for description. Call one "description" and another "normilized_description" in the "description" store the original value in the "normilized_description" store the value where you will replace all accented characters with non-accented ones. So in your case your records will look like this:
Record a:
description:'técnico a'
normilized_description: 'tecnico a'
Record b:
description:'tecnico b'
normilized_description: 'tecnico b'
Then if you need to search for both search by field "normilized_description" and if you need a specific value search by "description"

Related

how to match search normal letters with accent letters using JAVA and SQLITE3 [duplicate]

I am new in Android and I'm working on a query in SQLite.
My problem is that when I use accent in strings e.g.
ÁÁÁ
ááá
ÀÀÀ
ààà
aaa
AAA
If I do:
SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%a%' ORDER BY MOVIE_NAME;
It's return:
AAA
aaa (It's ignoring the others)
But if I do:
SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%à%' ORDER BY MOVIE_NAME;
It's return:
ààà (ignoring the title "ÀÀÀ")
I want to select strings in a SQLite DB without caring for the accents and the case. Please help.
Generally, string comparisons in SQL are controlled by column or expression COLLATE rules. In Android, only three collation sequences are pre-defined: BINARY (default), LOCALIZED and UNICODE. None of them is ideal for your use case, and the C API for installing new collation functions is unfortunately not exposed in the Java API.
To work around this:
Add another column to your table, for example MOVIE_NAME_ASCII
Store values into this column with the accent marks removed. You can remove accents by normalizing your strings to Unicode Normal Form D (NFD) and removing non-ASCII code points since NFD represents accented characters roughly as plain ASCII + combining accent markers:
String asciiName = Normalizer.normalize(unicodeName, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "");
Do your text searches on this ASCII-normalized column but display data from the original unicode column.
In Android sqlite, LIKE and GLOB ignore both COLLATE LOCALIZED and COLLATE UNICODE (they only work for ORDER BY). However, there is a solution without having to add extra columns to your table. As #asat explains in this answer, you can use GLOB with a pattern that will replace each letter with all the available alternatives of that letter. In Java:
public static String addTildeOptions(String searchText) {
return searchText.toLowerCase()
.replaceAll("[aáàäâã]", "\\[aáàäâã\\]")
.replaceAll("[eéèëê]", "\\[eéèëê\\]")
.replaceAll("[iíìî]", "\\[iíìî\\]")
.replaceAll("[oóòöôõ]", "\\[oóòöôõ\\]")
.replaceAll("[uúùüû]", "\\[uúùüû\\]")
.replace("*", "[*]")
.replace("?", "[?]");
}
And then (not literally like this, of course):
SELECT * from table WHERE lower(column) GLOB "*addTildeOptions(searchText)*"
This way, for example in Spanish, a user searching for either mas or más will get the search converted into m[aáàäâã]s, returning both results.
It is important to notice that GLOB ignores COLLATE NOCASE, that's why I converted everything to lower case both in the function and in the query. Notice also that the lower() function in sqlite doesn't work on non-ASCII characters - but again those are probably the ones that you are already replacing!
The function also replaces both GLOB wildcards, * and ?, with "escaped" versions.
You can use Android NDK to recompile the SQLite source including the desired ICU (International Components for Unicode).
Explained in russian here:
http://habrahabr.ru/post/122408/
The process of compiling the SQLilte with source with ICU explained here:
How to compile sqlite with ICU?
Unfortunately you will end up with different APKs for different CPUs.
You need to look at these, not as accented characters, but as entirely different characters. You might as well be looking for a, b, or c. That being said, I would try using a regex for it. It would look something like:
SELECT * from TB_MOVIE WHERE MOVIE_NAME REGEXP '.*[aAàÀ].*' ORDER BY MOVIE_NAME;

Nrs in Endeca query is not fetching results when we give encoded value along with English character in url

We are using Endeca to fetch the records since they are huge in number. We have a dataTable at frontend that displays the records fetched from Endeca through Endeca query.
Now, when we filter the results based on the checkbox values at frontend, query appends Nrs attribute and get the filtered results. For any chinese or russian or special characters, we encode them and create the query. Example:
N=0&Ntk=All&Ntx=mode+matchall&Ntt=rumtek&Nrs=collection()/record[(customerName="%22RUMTEK%22+LTD.")]&No=0&Ns=,Endeca.stratify(collection()/record[not%20(invoiceDate)])||invoiceDate|1||,Endeca.stratify(collection()/record[not%20(invoiceNumber)])||invoiceNumber|1
In above query, results are fetched based on value "rumtek" and we apply filter by giving value as ""RUMTEK" LTD.". After encoding, filter value is converted to "%22RUMTEK%22+LTD.". This query fetches no result.
Results are fetched when we either give the complete encoded term (like for any chinese word we give encoded value) or any English word. Results are not fetched when give terms containing double quotes(") example "ABC" LTD. or AB&C (AB%26C).
One more issue is:- what if we have made AB as Stop word (words that won't be searched). If we search for AB&C, then would it search the results for AB&C or it world make the entire term as stop word.
Any suggestion will be appreciated.
Thanks in Advance.
First, you need to make sure that your Nrs parameter is entirely and properly URL encoded. Second, you need to make sure you properly escape your double quotes because you want to match against them.
As you said, your data contains some record whose customerName property is (without brackets) ["RUMTEK" LTD.]. According to the MDEX Development Guide, to use double quotes as a literal value you need to escape it by prepending it with a double quote character (how confusing!). So, in order to match on this, you would need to have a query string like (separated into lines for readability):
N=0&
Ntk=All&
Ntx=mode+matchall&
Ntt=rumtek&
Nrs=collection()/record[(customerName="""RUMTEK"" LTD.")]&
&No=0&
Ns=,Endeca.stratify(collection()/record[not%20(invoiceDate)])||invoiceDate|1||,Endeca.stratify(collection()/record[not%20(invoiceNumber)])||invoiceNumber|1
Now, it isn't ready yet. You need to URL encode the ENTIRE Nrs parameter value. So it would become:
N=0&
Ntk=All&
Ntx=mode+matchall&
Ntt=rumtek&
Nrs=collection%28%29%2Frecord%5B%28customerName%3D%22%22%22RUMTEK%22%22+LTD.%22%29%5D&
&No=0&
Ns=,Endeca.stratify(collection()/record[not%20(invoiceDate)])||invoiceDate|1||,Endeca.stratify(collection()/record[not%20(invoiceNumber)])||invoiceNumber|1
That should get you what you need without having to resort to wildcard queries.

Retrieve selected strings from a list of words using Java

I would like to know are there any way to retrieve selected words from a string. Example I would like to retrieve the bold words from my textfile.
Below is the list of words which i would like to retrieve:
Item [00001: Chemical Engineering by R C Lodeen:
on loan to borrower 001: Mr X (41, NX4 4XZ)] borrowed by [001: Mr X (41, NX4 4XZ)]; timestamp 1418119171904
I have tried searching on google but the solution which they provide didnt help me. Can anyone guide me on how to solve this problem.
You will need to define a set of criteria in order to determine which words you want to retrieve, such as "Retrieve all characters that follow a '[' symbol until a ':' is found". After you have defined such a set, use regular expressions to select your words.

Search database table with all special characters

I have a table of project in which i have a project name and that project name may contain any special character or any alpha numeric value or any combination of number word or special characters.
Now i need to apply keyword search in that and that may contain any special character in search.
So my question is: How we can search either single or multiple special characters in database?
I am using mysql 5.0 with java hibernate api.
This should be possible with some simple sanitization of you query.
e.g: a search for \#(%*#$\ becomes:
SELECT * FROM foo WHERE name LIKE "%\\#(\%*#$\\%";
when evaluated the back slashes escape so that the search ends up being anything that contains "\#(%*#$\"
In general anything that's a special character in a string can be escaped via a backslash. This only really becomes tricky if you have a name such as: "\\foo\\bar\\" which to escape properly would become "\\\\foo\\\\bar\\\\"
A side note, please proof read your posts prior to finalizing. Its really depressing and shows a lack of effort when your questions title has spelling errors in it.

Catch-all second alternative for my start rule

I'm trying to write an ANTLR grammar for a little query language. Queries are a list of search terms restricted to specific fields:
field1:a field2:b field3:c
That's supposed to return a list of entities where field1 matches a, field2 matches b, and so on. Queries can also be completely unrestricted:
abc
That's supposed to return entities with any field that matches abc. Here's the ANTLR grammar:
#members {
String unrestrictedQuery;
}
FIELD1_OPERATOR: 'field1:';
FIELD2_OPERATOR: 'field2:';
FIELD3_OPERATOR: 'field3:';
DIGIT: '0'..'9';
LETTER: 'A'..'Z' | 'a'..'z';
query: subquery (' ' subquery)*
| UNRESTRICTED_QUERY=.* {unrestrictedQuery = $UNRESTRICTED_QUERY.text;}
;
I want unrestricted queries to be any text that doesn't match the query rule's first alternative.
1) Is there a better way to grab the text that the second alternative matched?
2) When I plug this into my web server, the unrestrictedQuery parser field resolves to the last character of the query. It seems like the action gets called for every character of the query when I really want the whole string.
Thanks for reading!
"I want unrestricted queries to be any text that doesn't match the query rule's first alternative".
This is a bad design decision. What if in future, you want to add Field4? Then incompatibility occur. Better change the grammar so that unrestricted queries are easily recognized. Surround field values (a, b, c) with quotes, or start unrestricted query with a colon:
field1:a :abc field2:b

Categories