Compare street name

Compare street name - java

I use Java , Spring, Ibatis, Oracle data base .
Inside that database we have 1 table is Street with 10 million records, the important column is street_name.
From GUI, I have to search company by street, for example : the street name input is Schonburgstrasse but the correct data inside DB is : Schönburgstrasse (German)
You can see that, the main different is : o and ö . And for sure I can't find this record by the SQL :
Select * from Street where street_name = 'Schonburgstrasse';
The rules are :
I can't change the data base schema any more.
I can't get 10M records to normalization one by one. After that compare data
(Normalization means , I will have function to convert From : Schönburgstrasse, To : Schonburgstrasse)
I have to take care for performance problem.
Thanks for your time.

Try using the Oracle SOUNDEX command, so the query will look like this:
Select * from Street where soundex(street_name) = soundex('Schonburgstrasse');

Oracle Text provides extensive capabilities for handling umlauts etc. In short:
create a fulltext index on your column (using a custom lexer)
search with the contains() operator instead of like

Related

Splitting the string , mapping and building the sql in java

I have some task to do and can't figure out how to reach the result.
For example i have some STRING like this :
key1=value1&key2=value2|key3=value3
This string like key=value with options AND or OR.
I have also some map that contains following things:
key1=ke1Mapped; key2=key2Mapped; key3=key3Mapped;
I want to build some sql Select * from table1 where
And insert mapped keys and its values, but also to take into account & and | , i.e. AND or OR to use in sql query when it's needed.
Actually, finally sql query should be:
Select * from table where ke1Mapped=value1 AND key2Mapped=value2 OR key3Mapped=value3
I tried to split the main string and convert to MAP and then to loop over the list of mapped values and build dynamically sql. But i don't know what is AND or OR.
Tried to push it in two stacks , also were complicated.
I try now to understand how to build only architecture of this flow. And can't figure out what approach is needed here.
Anyone can suggest something?
Thanks

Pass multiple rows or array to DB2 Store procedure from java

We need to update DB2 database with following type of data with store procedure from java application.
ManId ManaFirstName ManLastName CubicleId Unit EmpId EmpFirstName EmpLastName
2345 Steeven Rodrigue 12345RT HR 2456 John Graham
45464 Peter Black
Here, the columns related Emp (Emp Id , Emp First Name and Emp Last Name) is actually array, it can any number of employees from my front application for one manager.
We need to pass these values in Store Procedure and process in SP. However, I am not able to find any array type datatype in db2.
I know I can take following two approaches :-
1. Delimited value :- Have varchar column and append all the values with help of delimiter and split in SP.
2456,John,Graham|45464,Peter,Black
2. Have separate SP and call in batch.
However, I am looking for approach where I can pass them in single go and some more structured datatype. Does DB2 have datatype like array to support this or any way to create custom datatype.
I am using Spring JDBCTemplate at front end to call SP (I am flexible to change that) and DB2 as database.\
P.S. :- Open queries is not option for me , need to call SP only.
This SP is going to be called from java directly, so if have to use custom datatype, only scope is to define it in store procedure which is being called

Since the data types of Emp Id , Emp First Name and Emp Last Name are probably different, you should use a DB2 ROW type to contain them, not ARRAY. In Java that would be represented by java.sql.Struct. You can also pass an ARRAY of ROW types to the stored procedure. Check the manual for details and examples.

How to force hibernate to search for values with different patterns?

First of all, question is not about ignoring white spaces at the beginning or end of the strings so it is not a duplicate.
I have a mobile field in database that its values are in different formats such as xxx xxx xxx, xxxxxxxxx, x xxx xxx xx etc, how can I make hibernate criteria to ignore the patterns of strings?
For example, lets say the number in database is 344 555 666
344555666 is failed
344 555 666 is failed
344 is true (first three digits that do not have space in database!)
However, there is no doubt that all numbers are provided and all aforementioned values should return 344 555 666 as their results.
Another example would be as following:
Lets say a user searches for all phone numbers that includes 12345; then DB returns following results 12345678, 12345987 and 12345768 now I need to format these three numbers that are returned by DB before showing to the user.
Code
...
private String mobile;
....
Hibernate
.add(Restrictions.ilike("user.mobile", number);
PVR's answer is useful,but how about if in future I needed to add a new format like XXX-XXX-XXX or X-XXXX-XXXX-XXXX ? Please also note there is only one field that user uses to enter the search value.

Try using following..
criteria.add(Restrictions.ilike(
user.mobile, number, MatchMode.ANYWHERE));
Edit :
I meant that if the format of the no. in the database can only be one amongst XXX XXX XXXX / XXXXXXXXXX then we need to write a specific logic which checks both of the formats availability in database.
number1 : in format of XXX XXX XXXX
number2 : in format of XXXXXXXXXX
criteria.add(Restrictions.or(Restrictions.ilike(
user.mobile, number1, MatchMode.ANYWHERE),(Restrictions.ilike(
user.mobile, number2, MatchMode.ANYWHERE)));

Facing such problem, I usually reverse it. Currently, you have in a single column of your database (mobile) values in different formats (xxx xxx xxx, xxxxxxxxx, x xxx xxx xx etc.) and it is hard to make search on that column.
You should still allow input of mobile numbers in all those formats, but carefully rewrite them in one single format say 12345679 before writing them in database. This way that reformatting occurs only when inserting new records or on updates, and I assume you will have much more read accesses than write ones.
If you allready have records in your database, you should considere using a batch to transform them in one single operation .
Once you have only one format, you can put an index on the column as it could speed select queries by orders of magnitude as soon as you have thousands of records.
When you want to do a search, allow any format for user input of what they want, and apply same transformation that you apply on insert. For example if a user presents 123 456 789 or 123-456-789 or any of your accepted format, in your code for search transform it in 123456789 and do you query with that value (using the index ...)
From user point of view, you still allow he to present input as he wants, and simply the responses may come faster. The only drawback is that you will display not the value he entered but a standardized version of it.
From your point of view (as the programmer) you get something simpler to write and to maintain with less stress on database.

did you try Projections.sqlProjection
You can use replace REPLACE(mobile, ' ') inside

I know this is an answer that could eat up your db resources, you can
test it and check if it matches your need.
I've done phone number formatting before, but, the solution you are looking for could be difficult, if you have to search using regex I'll construct a regex in the code and search in the db. (Oracle has regex_like function, you may want to use that instead of ilike of hibernate)
eg phone number from client +333 555 9999, phone number in db: +3 33 555 9999
Construct the following regex based on what client sends:
/+(\s-.)*3(\s-.)*3(\s-.)*3(\s-.)*5(\s-.)*5(\s-.)*5(\s-.)*9(\s-.)*9(\s-.)*9(\s-.)*9(\s-.\d\w)*/
What you are saying is there could be many (.) dots may (\s) spaces many (-) hiphens in a phone number trailing with many(.\s-\d\w) (eg: x234 or ext2342)

As per your conversation with PVR, it seems like the format of the phone number can be anything.
Hibernate framework is based on patterns. It cannot handle any format on its own.
Its advisable to not include the phone number based criteria in Hibernate. You must execute your entire criteria query without the phone number and thereafter you must have java logic for filtering rest of the results.
However, the best solution is make your design more solid. Adding constraint on the format of the phone is the best practice. You can consider adding a validation on format of phone.

You can write your own Criterion by implementing the Criterion interface.
In your toSqlString method just use the replace function of your database. AFAIK replace(str, needle, replacement) is a SQL99 standard function so it should work in todays dbms.

Queryproblem with mongodb

I have 2 collections in a mongodb database.
example:
employee(collection)
_id
name
gender
homelocation (double[] indexed as geodata)
companies_worked_in (reference, list of companies)
companies(collection)
_id
name
...
Now I need to query all companies who's name start with "wha" and has/had employees which live near (13.444519, 52.512878) ie.
How do I do that without taking too long?
With SQL it would've been a simple join (without the geospatiol search of course... :( )

You can issue 2 queries. (Queries I wrote are in JavaScript)
First query extracts all companies whose name starts with wha.
db.companies.find({name: {$regex: "^wha"}}, {_id: 1})
Second query can be like
db.employees.find({homelocation: {$near: [x,y]}, companies_worked_in: {$in: [result_from_above_query]} }, {companies_worked_in: 1})
Now simply filter companies_worked_in and have only those companies whose name starts with wha. I know it seems like the first query is useless in this case. But a lot of records would be filtered by $in query.
You might have to write some intermediate code between this two queries. I know this is not a single query solution. But this is one possible way to go and performance is also good depending upon what fields you index upon. In this case consider creating index on name (companies collection) and homelocation (geo-index) + companies_worked_in (employee collection) would help you gain performance.
P.S.
I doubt if you could create a composite index over homelocation and companies_worked_in, since both are arrays. You would have to index on one of these fields only. You might not be able to have a composite index.
Suggestion
Store the company name as well in employee collection. That ways you can avoid first query.

Find matching records with least characters from Pattern - Oracle / Java

The web application I am working currently has an File import logic. The logic
1> reads the records from a file [excel or txt],
2> shows a non editable grid of all the records imported [New records are marked as New if they do not exist in the database and existing records are marked as Update] and
3> dumps the records in the database.
a file containing contacts with following format in the file (mirrors the columns in the database with primary keys First_Name, Last_Name):
First_Name, Last_Name, AddressLine1, AddressLine2, City, State, Zipcode
The issue we are running into is when there are different values for the same entity being entered in the file. example, Someone might type NY for New York while others would put in New York. Same applies to first name or last name ex. John Myers and John Myer refer to the same person, but because the record does not match exactly, it inserts the record rather than reusing it for an update.
Example, for the record from the file (Please note the name and address usage is purely coincidental :) ):
John, Myers, 44 Chestnut Hill, Apt 5, Indiana, Indiana, 11111
and the record in the database:
John, Myer, 80 Washington St, Apt 1, Chicago, IL, 3333
the system should have detected the record in the file as existing record [because of the last name being Myers and Myer and since first name matches completely] and do an update on the Address, but rather inserts a new value.
How can I approach this issue where I would want to find all the records that would perform the match on the existing records in the database?

It is a very difficult problem to solve, if you know the sources of your data, then you could attempt to manually rectify the different combinations of data input.
Else
you could try for phonetic data cleaning solutions

One solution I could think of is using Regex in Oracle to achieve the functionality upto some extent.
For each of the column, I would generate Regex expression half way through the String length. example, for the name "Myer" in the file and "Myers" in the database, following query would work:
SELECT Last_Name from Contacts WHERE (Last_Name IS NULL OR Regexp_Like(Last_Name, '^Mye?r?$'))
I would consider this as a partial solution because I would parse the input string and start appending the none or only one operator from half the length to the end of the string and hoping the input string is not so messed up.
Hoping to find some feedback from others on SO for this "solution".

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.