Match string with normal characters and special characters in Spring - java

I'm trying to find a way to match user search queries with a database records in a search engine, using Spring, but I'm having trouble when the search query includes special characters such as vowels with accent.
Eg: search query = 'cafe'. Database record = 'café'
I'm using the stem of words to the query with the database records.
Which would be the most straight forward way of matching the query including a special character 'café' with the string that doesn't contain this special character 'cafe' and viceversa?
UPDATE
All the information I need is already cached so the approach of creating a new column in the db is not so appealing. I'm looking for a solution more spring based.

You could use java.text.Normalizer, like follow:
import java.text.Normalizer;
import java.text.Normalizer.Form;
public static String removeAccents(String text) {
return text == null ? null :
Normalizer.normalize(text, Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
The Normalizer splits the original characters into a set of two character (letter and accent).
For example the character á (U+00E1) will be split in a (U+0061) and acute accent U+0301
The \p{InCombiningDiacriticalMarks}+ regular expression will match all such diacritic codes and we will replace them with an empty string.
And your query could be like:
SQL SERVER
SELECT * FROM Table
WHERE Column Like '%stringwithoutaccents%' COLLATE Latin1_general_CI_AI
ORACLE (from 10g)
SELECT * FROM Table
WHERE NLSSORT(Column, 'NLS_SORT = Latin_AI')
Like NLSSORT('%stringwithoutaccents%', 'NLS_SORT = Latin_AI')
The CI stands for "Case Insensitive" and AI for "Accent Insensitive".
I hope it helps you.

Related

Get list of parameter names from native sql expression (regex)

I'm having trouble getting list of all parameters in SQL query using Regex.
Example of the query:
SELECT ... WHERE col1 = :user AND col2 = 'HELLO' OR col3 = :language
To obtain parameters, I use following regex pattern:
Pattern.compile(":([\\w.$]+|\"[^\"]+\"|'[^']+')", Pattern.MULTILINE)
The pattern returns list of parameters correctly:
:user
:language
The problem is with another type of query, where literals might contain character ':'
WHERE col1 = :user AND some_date > '2022-09-26T10:22:55'
The list of parameters for this case is:
:user
:22
:55
Is there any better approach that will not consider contents of literals as parameters?
You could simplify your problem by assuming that a named param in sql is just a word with prefix : and always follows after a space (this is actually not a requirement or always true but might be just good enough to get you acceptable results with as simple of regex as possible)
Pattern.compile(" :\\w+", Pattern.MULTILINE)
--
summary of the comments:
had to match
- foo = :param AND :param = bar AND foo=:param AND :param=bar
- AND FUNC(:param) OR FUNC(0, :param) OR FUNC(:param, 0)
finally this regex with fixed length lookahead and variable length lookbehind was helpful:
Pattern.compile("(?<=[=(])\\s*:[\\w_.]+|:[\\w_.]+(?=\s*[=)])", Pattern.MULTILINE)

Unable to capture next line character in Java

I have a requirement of parsing through an python file which contains multiple sql queries and get the start and end positions of the query to get only the query part using JAVA
I am using .contains function to check for sql(''' as my opening character for the query and now for the closing character I have ''') but there are some cases where ''') comes in between the query when there is a variable involved which should not be detected as an end of the query.
Something like this :
spark.sql(''' SELECT .......
FROM.....
WHERE xxx IN ('''+ Variable +''')
''')
here the last but one line also gets detected as end of line if I use line.contains(" ''') ") which is wrong.
All I can think of is to check for next line character as the end of the query as each query is separated by two empty lines. So tried these if (line.contains(" ''')\n") & if (line.contains(" ''')\r\n") but none of them work for me.
Kindly let me know of any other way to do this.
Note that I do not have the privilege to change the query file.
Thanks
I believe simple contains won't solve this problem.
You will have to use Pattern if you are looking to match \n.
String query = "spark.sql(''' SELECT .......\n" +
"FROM..... \n" +
"WHERE xxx IN ('''+ Variable +''')\n" +
"''')";
Pattern pattern = Pattern.compile("^spark.sql\\('''(.*)'''\\)$", Pattern.DOTALL);
System.out.println(pattern.matcher(query).find());
Output:
true
Pattern.DOTALL tells Java to allow the dot to match newline characters, too.

How can i use lithuanian special letters in JAVA

I want to filter table and check string in selenium and that string in the web contains Lithuanian special letter so i get something like this "M?nesis" instead of "Mėnesis"
ElementsCollection activePlans = $$(".view-content .tile__title").filterBy(text("Mėnesis"));
How can i do that?

JPA Select query not returning results with one letter word

I have a query that when given a word that starts with a one-letter word followed by space character and then another word (ex: "T Distribution"), does not return results. While given "Distribution" alone returns results including the results for "T Distribution". It is the same behavior with all search terms beginning with a one-letter word followed by space character and then another word.
The problem appears when the search term is of this pattern:
"[one-letter][space][letter/word]". example: "o ring".
What would be the problem that the LIKE operator not working correctly in this case?
Here is my query:
#Cacheable(value = "filteredConcept")
#Query("SELECT NEW sina.backend.data.model.ConceptSummaryVer04(s.id, s.arabicGloss, s.englishGloss, s.example, s.dataSourceId,
s.synsetFrequnecy, s.arabicWordsCache, s.englishWordsCache, s.superId, s.categoryId, s.dataSourceCacheAr, s.dataSourceCacheEn,
s.superTypeCasheAr, s.superTypeCasheEn, s.area, s.era, s.rank, s.undiacritizedArabicWordsCache, s.normalizedEnglishWordsCache,
s.isTranslation, s.isGloss, s.arabicSynonymsCount, s.englishSynonymsCount) FROM Concept s
where s.undiacritizedArabicWordsCache LIKE %:searchTerm% AND data_source_id != 200 AND data_source_id != 31")
List<ConceptSummaryVer04> findByArabicWordsCacheAndNotConcept(#Param("searchTerm") String searchTerm, Sort sort);
the result of the query on the database itself:
link to screenshot
results on the database are returned no matter the letters case:
link to screenshot
I solved this problem.
It was due to the default configuration of the Full-text index on mysql database which is by default set to 2 (ft_min_word_len = 2).
I changed that and rebuilt the index. Then, one-letter words were returned by the query.
12.9.6 Fine-Tuning MySQL Full-Text Search
Use some quotes:
LIKE '%:searchTerm%';
Set searchTerm="%your_word%" and use it on query like this :
... s.undiacritizedArabicWordsCache LIKE :searchTerm ...

Mongo Query Parser Regex

i want to extract query part(name,find,sort,limit - split by dot(.)) from mongo query via regex
input - >
db.metrics.find(
{
"brand_name":"Apple",
"job_status.status":"SUCCESS",
'host.user':'root',
"current_time":{$gt:new Date(Date.now() - 3*60*60 * 1000)}
}
).sort({"current_time" : -1}).limit(10)
with help of 2-3 stackoverflow answer i have build below regex
regex = `\.(?=(([^']*'){2})*[^']*$)(?=(([^\"]*\"){2})*[^\"]*$)(?![^()]*\\)`
which solves my use case till certain extent
i am not able to ignore dot(.) char group inside curly braces (Date.now())
regExr.com matched screen shot
i need regex which should ignore .now() part from above query

Categories