Check if a data set is present in a database - java

I have an array of some strings in my Java code. I want to check if (and which) of the values in that array are present in the mySql database that I am using. The way I have tried to do it is query the data base for each individual value in the array. I wanted to know if there is a more efficient way of doing this?
Thanks in advance.
In my java code:
String[] arrProducts=new String[]{"AB","BC","CD","AE","fg","BV","etc"};
In mySql Products database i have a productsInventory table, which has a column productId. So basically I want to check if the entire arrProduct values are present in the column productId instead of querying individually like: Select * from ProductsInventory where productId like 'AB'.
EDIT1:
So my table looks like this:
ProductsInventory->{ProductName,productId}
right mow I am querying the table using one query for each value in my array.
Eg:
Select * from ProductsInventory where productId like 'AB';
Select * from ProductsInventory where productId like 'BC';
Select * from ProductsInventory where productId like 'CD';
SO depending on the number of elements in my array I need to send multiple queries.
EDIT 2: And my array can change depending on user interaction. What the user enters is stored in my array and I need to check if those values are present in the database table.

You can use the following with the 'in' keyword.Just pass the array as a value inside 'in' parameter the resulting query will look like this below:
select * from ProductsInventory where productId in ("AB","BC","CD","AE","fg","BV","etc");
but in will have considerable performance degradation if the array is too long.

As an alternative to the IN structure, or the support for regular expressions with rlike, you can use full text search. The integrated solution may be limited in speed in functionality, but 3rd party solutions (which integrate nicely with MySQL and Java) can provide nice performance for complex functionalities, like Sphinx or Solr at the cost of more resources: Full text options in MySQL that allows search in natural language.

you can try something like:
SELECT * from ProductsInventory where productId in ('AB,'BC','CD')
it will probably save you time, but dependent on whats in the database and how things are structured it could be quite expensive(time,processing)

Related

Position Autoincrement in Talend

So i a bit lost and don t really know how to hang up this one...
Consider that i have a 2 DB table in Talend, let say firstly
A table invoices_only which has as fields, the invoiceNummer and the authors like this
Then, a table invoices_table with the field (invoiceNummer, article, quantity and price) and for one invoice, I can have many articles, for example
and through a tmap want to obtain a table invoice_table_result, with new columns, one for the article position, an one other for the total price. for the position i know that i can use something like the Numeric.sequence("s1",1,1) function, but don t know how to restart my counter when a new invoices nummer is found, and of course for the total price it is just a basic multiplication
so my result should be some thing like this
Here is a draft of my talend job, i m doing a lookup on the invoicenummer between the table invoice_only and invoices
Any Advices? thanks.
A trick I use is to do the sequence like this:
Numeric.sequence("s" + row.InvoiceNummer, 1, 1)
This way, the sequence gets incremented while you're still on the same InvoiceNummer, and a new one is started whenever a new InvoiceNummer is found.
There are two ways to achieve it,
tJavaFlex
Sql
tJavaFlex
You can compare current data with the previous data and reset the sequence value using below function,
if () {
Numeric.resetSequence(seqName, startValue);
}
Sql
Once data is loaded into the tables, create a post job and use an update query to update the records. You have to select the records and take the rank of the values. On top of the select you have to perform the update.
select invoicenumber, row_number() over(partition by invoicenumber, order by invoicenumber) from table name where -- conditions if any.
Update statements vary with respect to the database, please provide which database are you using, so that can provide the update query.
I would recommend you to achieve this through Sql

Speed up the data retrieval from Documentum via DQL

I am doing a Java project connected to Documentum and I need to retrieve data from object table. The thing is when I retrieve from 1 table I can get the answeres in max 2 seconds for each of the following tables with the following DQLs:
SELECT * FROM cosec_general
and
SELECT * FROM dm_dbo.cosec_general_view
however once I want to join those two tables together in retrieve from the result it takes 5 min to do so.
Is there any way that I can make it faster?
Here is the DQL that I use to join them I get teh columns that I need:
SELECT dm_dbo.cosec_general_view.name, dm_dbo.cosec_general_view.comp_id,
dm_dbo.cosec_general_view.bg_name, dm_dbo.cosec_general_view.incorporation_date,
dm_dbo.cosec_general_view.status, dm_dbo.cosec_general_view.country_name,
cosec_general.acl_domain, cosec_general.acl_name
FROM dm_dbo.cosec_general_view, cosec_general
There is no condition on which fields you are trying to join,
Add WHERE clause containing condition for join, like
WHERE dm_dbo.cosec_general_view.field_1=cosec_general.field_2
You are using wrong approach. In query
SELECT * FROM cosec_general
the asterisk * means return me everything. Once you loaded information to the memory object manipulation with it should be measured in milliseconds.

query a number of rows in sqlite ordered data faster?

I have a sqlite db that data are ordered by one string field. And i want to retrieve a number of row that begin from a certain, by dictionary order of string field. Now i'm use LIMIT query, but i'm not satisfied with speed. I think search only the first, and then tell sqlite get number of row follow is better, how i can do like that? (My english is not good, sorry if i wrong)
I am not sure of what you mean but I will try to answer. Sorry by advance if I am out of the track.
Let consider that your string column is called mystring. If you want to have good performance when selecting with a specific order, you have to have an index on the column you want to sort.
Your query will look like that :
SELECT *
FROM mytable
ORDER BY mystring
LIMIT 5,10
That is the fastest solution to do it.

Fuzzy Matching in H2 Database?

I was just wondering if there was a simple way to implement Fuzzy matching of strings using the H2 Database.
I have in the database a list of names and I want to be able to search through them using 3 characters that may be found anywere in the name in the order the 3 characters are typed in.
i'm not sure if that's even possible to do, but it would make life much easier if it were possible to be done in the database via SQL and not Java
You could use
select * from test where name like '%xyz%'
See also the documentation of LIKE.
Another option is to use SOUNDEX:
select * from test where soundex(name) = soundex('word')
In both cases, an index can not be used. That means the query is slow if there are many rows in the table, as each row must be checked.

Determine if column in ResultSet contains values in all rows

In my application, I perform a costly query that takes minutes to produce a report. I am trying to make a generic class that transforms a ResultSet to and Excel spreadsheet, where a column is excluded from the spreadsheet if it only contains nulls. I can remove the columns from the Excel sheet after the fact easily, but it is difficult to "glue" worksheets back together after I have already split them when there are too many columns.
I could do a query to check if each column is null, but this would entail running the costly query all over again, perhaps multiple times, which would make the generation of the spreadsheet take too long.
Is there a way that I can query the ResultSet object that I already have (a little like ColdFusion) and remove columns from it?
EDIT
I ended up adding a pre-processing step where I added the column numbers of the used columns to a List<Integer> and then iterating through that collection rather than the set of all columns in the ResultSet. A few off-by-one errors later, and it works great.
Can you extract the data from the ResultSet and store it in memory first, before creating the work sheet, or is it too large? If so, then while you're extracting it you could remember whether a non-null value has been seen in each column. Once you're done extracting, you know exactly which columns can be omitted. Of course this doesn't work so well if the amount of data is so large that you wouldn't want to store it in memory.
Another solution would be to store the results of the costly query in a "results" table in the database. Each row for a given query execution would get stamped with a "query id" taken from a database sequence. Once the data is loaded into this table, subsequent queries to check whether "all values in column X are null" should be pretty speedy.
Note: if you're going to take this second approach, don't pull all the query data up to your application before storing it back to the results table. Rewrite the original "costly" query to do the insert. "insert into query_result(columns...) select {costly query}".
I could do a query to check if each
column is null
Better still you could incorporate that check into the original query, via a COUNT etc. This will be miles quicker than writing Java code to the same effect.

Categories