Fuzzy Matching in H2 Database? - java

I was just wondering if there was a simple way to implement Fuzzy matching of strings using the H2 Database.
I have in the database a list of names and I want to be able to search through them using 3 characters that may be found anywere in the name in the order the 3 characters are typed in.
i'm not sure if that's even possible to do, but it would make life much easier if it were possible to be done in the database via SQL and not Java

You could use
select * from test where name like '%xyz%'
See also the documentation of LIKE.
Another option is to use SOUNDEX:
select * from test where soundex(name) = soundex('word')
In both cases, an index can not be used. That means the query is slow if there are many rows in the table, as each row must be checked.

Related

Database Search with key words using jpa

I'm doing college work where I have to search by keywords. My entity is called Position and I'm using MySQL. The fields that I need to search are:
    - date
    - positionCode
    - title
    - location
    - status
    - company
    - tecnoArea
I need to search the same word in all of these fields. To this end, I used criteria API to create a dynamic query. It is the same word for several fields and it should get the maximum possible results. Do you have any advice about how to optimize the search on the database. Should I do several queries?
EDIT
I will use an OR constraint.
If you will need to find the key word at any position within the data you will need to use LIKE with wildcards, eg. title LIKE '%manager%'. Since date and positionCode (presumably a numeric type) are not likely to contain the key word, to achieve a very small performance gain, I would omit searching these columns for the key word. Your query is going to need to do a serial read, which means that all rows in the table will need to be brought into main memory to evaluate and retrieve the result set of your query. Given a serial read is going to happen anyway, I do not think there is too much you can do to optimize the query when searching multiple columns. I am not familiar with the "criteria api to create dynamic queries", but using dynamic queries in other systems is non-optimal - they must be parsed and evaluated every time the are run and most query optimize-rs cannot make use of the statistics for cost-based optimization to improve performance like they can with explicitly defined SQL.
Not sure what your database is.
If it is Oracle, you can use Oracle text.
The below link might be useful :
http://swiss-army-development.blogspot.com/2012/02/keyword-search-via-oracle-text.html

Hibernate Search query for class

I'm using hibernate search 4.4.0. And I met a problem recently.
E.g, I have 2 classes INDEXING and DATA_PROPERTY. There is no association between 2 of them. And I can't change them or creat a new class to associate 2 of them.
Part of Lucene indexing:
mapping.entity(DatatypeProperty.class).indexed().providedId()
.property("rdfResource",ElementType.FIELD).field().analyze(Analyze.NO).store(Store.YES)
.property("partitionValue", ElementType.FIELD).field().analyze(Analyze.NO)
mapping.entity(Indexing.class).indexed().providedId()
.property("rdfResource",ElementType.FIELD).field().analyze(Analyze.NO).store(Store.YES)
Now in the SQL, I use
SELECT IND.RDF_RESOURCE
FROM INDEXING IND, DATA_PROPERTY DP
WHERE IND.RDF_RESOURCE = DP.RDF_RESOURCE
AND IND.OBJECT_TYPE_ID_INDEXED IN (........)
AND DP.PARTITION_VALUE IN (......)
AND .......
How can I translate IND.RDF_RESOURCE = DP.RDF_RESOURCE in Hibernate Search???
I thought maybe I can use the query to find all the RDF_RESOURCE of class DatatypeProperty and matching all of them in the query for class Indexing. But it seems very inefficiency.
Does anyone has a better way for this?
I have 2 classes INDEXING and DATA_PROPERTY. There is no association
between 2 of them. And I can't change them or create a new class to
associate 2 of them.
In this case you are between a rock and a hard place. You will need to associate the records somehow and the most obvious choice is via an association. Also, you cannot compare a SQL join with a free text based index provided by Lucene.
One potential solution could be to write a custom bridge which at indexing time executes the join and indexes the relevant data, so that you can target it directly via your query. Whether this works for you will depend on your use case. In your example setup, I don't see any field which would benefit from free text search. I can only assume that you are only showing parts of your code. If not, why don't you just stick with SQL?

Check if a data set is present in a database

I have an array of some strings in my Java code. I want to check if (and which) of the values in that array are present in the mySql database that I am using. The way I have tried to do it is query the data base for each individual value in the array. I wanted to know if there is a more efficient way of doing this?
Thanks in advance.
In my java code:
String[] arrProducts=new String[]{"AB","BC","CD","AE","fg","BV","etc"};
In mySql Products database i have a productsInventory table, which has a column productId. So basically I want to check if the entire arrProduct values are present in the column productId instead of querying individually like: Select * from ProductsInventory where productId like 'AB'.
EDIT1:
So my table looks like this:
ProductsInventory->{ProductName,productId}
right mow I am querying the table using one query for each value in my array.
Eg:
Select * from ProductsInventory where productId like 'AB';
Select * from ProductsInventory where productId like 'BC';
Select * from ProductsInventory where productId like 'CD';
SO depending on the number of elements in my array I need to send multiple queries.
EDIT 2: And my array can change depending on user interaction. What the user enters is stored in my array and I need to check if those values are present in the database table.
You can use the following with the 'in' keyword.Just pass the array as a value inside 'in' parameter the resulting query will look like this below:
select * from ProductsInventory where productId in ("AB","BC","CD","AE","fg","BV","etc");
but in will have considerable performance degradation if the array is too long.
As an alternative to the IN structure, or the support for regular expressions with rlike, you can use full text search. The integrated solution may be limited in speed in functionality, but 3rd party solutions (which integrate nicely with MySQL and Java) can provide nice performance for complex functionalities, like Sphinx or Solr at the cost of more resources: Full text options in MySQL that allows search in natural language.
you can try something like:
SELECT * from ProductsInventory where productId in ('AB,'BC','CD')
it will probably save you time, but dependent on whats in the database and how things are structured it could be quite expensive(time,processing)

Dynamically create search criteria and results using Java and sql database

I am currently working with a Java based web application (JSF) backed by Hibernate that has a variety of different search pages for different areas.
A search page contains a search fields section, which a user can customize the search fields that they are interested in. There are a range of different search field types that can be added (exact text, starts with, contains, multi-select list boxes, comma separated values, and many more). Search fields are not required to be filled in and are ignored, where as some other search fields require a different search field to have a value for this search field to work.
We currently use a custom search object per area that is specific to that area and has hard coded getter and setter search fields.
public interface Search {
SearchFieldType getSearchPropertyOne();
void setSearchPropertyOne(SearchFieldType searchPropertyOne);
AnotherSearchFieldType getSearchPropertyTwo();
void setSearchPropertyTwo(AnotherSearchFieldType searchPropertyTwo);
...
}
In this example, SearchFieldType and AnotherSearchFieldType represent different search types like a TextSearchField or a NumericSearchField which has a search type (Starts with, Contains, etc.) or (Greater Than, Equals, Less Than, etc.) respectively and a search value that they can enter or leave empty (to ignore the search field).
We use this search object to prepare a Criteria object
The search results section is a table that can also be customized by the user to contain only columns of the result object that they are interested in. Most columns can be ordered ascending or descending.
We back our results in a Result object per result which also hard codes the columns that can be displayed. This table is backed by hibernate annotations, but we are trying to use flat data instead of allowing other hibernate backed objects to minimize lazy joining data.
#Entity(table = "result_view")
public interface Result {
#Column(name = "result_field_one")
Long getResultFieldOne();
void setResultFieldOne(Long resultFieldOne);
#Column(name = "result_field_two")
String getResultFieldTwo();
void setResultFieldTwo(String resultFieldTwo);
...
}
The search page is backed by a view in our database which handles the joining to all the tables needed for every possible outcome. This view has gotten pretty massive and we take a huge performance hit for every search, even when a user only really wants to search on one field and display a few columns because we have upwards of thirty search field options and thirty different columns they can display and this is all backed by the one view.
On top of this, users request new search fields and columns all the time that they would like added to the page. We end up having to alter the search and result objects as well as the backing view to make these changes.
We are trying to look into this matter and find alternatives to this. One approach mentioned was to create different views that we dynamically choose based on the fields searched on or displayed in the results table. The different views might join different columns and we pick and choose which view we need for any given search.
I'm trying to think about the problem a different way. I think it might be better to not use a view and instead dynamically join tables we need based on what search fields and result columns are requested. I also feel that the search and result objects should not contain hard coded getters/setters and should instead be a collection of search fields and a collection (or map) of result columns. I have yet to completely flesh out my idea.
Would hibernate still be a valid solution to this issue? I wouldn't want to have to create a Result object used in a hibernate criteria since they result columns can be different. Both search fields and/or result columns might require joining tables.
Is there a framework I could use that might help solve the problem? I've been trying to look for something, and the closest thing I have found is SqlBuilder.
Has anyone else solved a similar problem dynamically?
I would prefer not to reinvent the wheel if a solution already exists.
I apologize that this ended up as a wall of text. This is my first stackoverflow post, and I wanted to make sure I thoroughly defined my problem.
Thanks in advance for your answers!
I don't fully understand the problem. But JPA Criteria API seems very flexible, which can be used to build query based on user-submitted filtering conditions.

Does using Limit in query using JDBC, have any effect in performance?

If we use the Limit clause in a query which also has ORDER BY clause and execute the query in JDBC, will there be any effect in performance? (using MySQL database)
Example:
SELECT modelName from Cars ORDER BY manuDate DESC Limit 1
I read in one of the threads in this forum that, by default a set size is fetched at a time. How can I find the default fetch size?
I want only one record. Originally, I was using as follows:
SQL Query:
SELECT modelName from Cars ORDER BY manuDate DESC
In the JAVA code, I was extracting as follows:
if(resultSett.next()){
//do something here.
}
Definitely the LIMIT 1 will have a positive effect on the performance. Instead of the entire (well, depends on default fetch size) data set of mathes being returned from the DB server to the Java code, only one row will be returned. This saves a lot of network bandwidth and Java memory usage.
Always delegate as much as possible constraints like LIMIT, ORDER, WHERE, etc to the SQL language instead of doing it in the Java side. The DB will do it much better than your Java code can ever do (if the table is properly indexed, of course). You should try to write the SQL query as much as possibe that it returns exactly the information you need.
Only disadvantage of writing DB-specific SQL queries is that the SQL language is not entirely portable among different DB servers, which would require you to change the SQL queries everytime when you change of DB server. But it's in real world very rare anyway to switch to a completely different DB make. Externalizing SQL strings to XML or properties files should help a lot anyway.
There are two ways the LIMIT could speed things up:
by producing less data, which means less data gets sent over the wire and processed by the JDBC client
by potentially having MySQL itself look at fewer rows
The second one of those depends on how MySQL can produce the ordering. If you don't have an index on manuDate, MySQL will have to fetch all the rows from Cars, then order them, then give you the first one. But if there's an index on manuDate, MySQL can just look at the first entry in that index, fetch the appropriate row, and that's it. (If the index also contains modelName, MySQL doesn't even need to fetch the row after it looks at the index -- it's a covering index.)
With all that said, watch out! If manuDate isn't unique, the ordering is only partially deterministic (the order for all rows with the same manuDate is undefined), and your LIMIT 1 therefore doesn't have a single correct answer. For instance, if you switch storage engines, you might start getting different results.

Categories