SQL logic context/scenario - java

I have this scenario: user want to see tons of information about himself. For example: age, name, status, income, job, hobby, children's name, wife's name, chief's name, grandfather/grandmother names. About 50 variables. And he can choose any of variables to show the information.
So, I have this class *Impl.java passing with 50 params. Within 50 params, let's say 25 will be null and others will be shown. And it will return the selected information.
How can I create a query in SQL to get the columns selected from params? Should I create a procedure and then do the query select? Or is it bad to do what I'm trying to achieve?
I'm using Web Services and Spring JDBC. If requires more information, I'll edit.

Building a SELECT statement to return arbitrarily selected columns can be tricky (dynamic SQL) at best and dangerous (SQL Injection) at worst. If there are only 50 columns and the query used to pull them is relatively trivial*, I'd say write the query to pull all possible values for one user and then have the application sift and sort through the data they actually want to see.
*It really does seem like the query should be trivial. At a super-high average of 25 bytes per column that'd be 1250 bytes, aka nothing in 21st century terms, and at maybe one row per table joined via primary key it should still be sub-100th-second work.

Related

Position Autoincrement in Talend

So i a bit lost and don t really know how to hang up this one...
Consider that i have a 2 DB table in Talend, let say firstly
A table invoices_only which has as fields, the invoiceNummer and the authors like this
Then, a table invoices_table with the field (invoiceNummer, article, quantity and price) and for one invoice, I can have many articles, for example
and through a tmap want to obtain a table invoice_table_result, with new columns, one for the article position, an one other for the total price. for the position i know that i can use something like the Numeric.sequence("s1",1,1) function, but don t know how to restart my counter when a new invoices nummer is found, and of course for the total price it is just a basic multiplication
so my result should be some thing like this
Here is a draft of my talend job, i m doing a lookup on the invoicenummer between the table invoice_only and invoices
Any Advices? thanks.
A trick I use is to do the sequence like this:
Numeric.sequence("s" + row.InvoiceNummer, 1, 1)
This way, the sequence gets incremented while you're still on the same InvoiceNummer, and a new one is started whenever a new InvoiceNummer is found.
There are two ways to achieve it,
tJavaFlex
Sql
tJavaFlex
You can compare current data with the previous data and reset the sequence value using below function,
if () {
Numeric.resetSequence(seqName, startValue);
}
Sql
Once data is loaded into the tables, create a post job and use an update query to update the records. You have to select the records and take the rank of the values. On top of the select you have to perform the update.
select invoicenumber, row_number() over(partition by invoicenumber, order by invoicenumber) from table name where -- conditions if any.
Update statements vary with respect to the database, please provide which database are you using, so that can provide the update query.
I would recommend you to achieve this through Sql

Creating report from 1 million + records in MySQL and display in Java JSP page

I am working on a MySQL database with 3 tables - workout_data, excercises and sets tables. I'm facing issues related to generating reports based on these three tables.
To add more information, a number of sets make up an excercise and a number of excercises will be a workout.
I currently have the metrics to which a report is to be generated from the data in these tables. I've to generate reports for the past 42 days including this week. The queries run for a long time by the time I get the report by joining these tables.
For example - the sets table has more than 1 million records just for the past 42 days. The id in this table is the excercise_id in excercise table. The id of excercise table is the workout_id in workout_data table.
I'm running this query and it takes more than 10 minutes to get the data. I have to prepare a report and show it to the user in the browser. But due to this long running query the webpage times out and the user is not able to see the report.
Any advice on how to achieve this?
SELECT REPORTSETS.USER_ID,REPORTSETS.WORKOUT_LOG_ID,
REPORTSETS.SET_DATE,REPORTSETS.EXCERCISE_ID,REPORTSETS.SET_NUMBER
FROM EXCERCISES
INNER JOIN REPORTSETS ON EXCERCISES.ID=REPORTSETS.EXCERCISE_ID
where user_id=(select id from users where email='testuser1#gmail.com')
and substr(set_date,1,10)='2013-10-29'
GROUP BY REPORTSETS.USER_ID,REPORTSETS.WORKOUT_LOG_ID,
REPORTSETS.SET_DATE,REPORTSETS.EXCERCISE_ID,REPORTSETS.SET_NUMBER
Two things:
First, You have the following WHERE clause item to pull out a single day's data.
AND substr(set_date,1,10)='2013-10-29'
This definitively defeats the use of an index on the date. If your set_date column has a DATETIME datatype, what you want is
AND set_date >= `2013-10-09`
AND set date < `2013-10-09` + INTERVAL 1 DAY
This will allow the use of a range scan on an index on set_date. It looks to me like you might want a compound index on (user_id, set_date). But you should muck around with EXPLAIN to figure out whether that's right.
Second, you're misusing GROUP BY. That clause is pointless unless you have some kind of summary function like SUM() or GROUP_CONCAT() in your query. Do you want ORDER BY?
Comments on your SQL that you might want to look into:
1) Do you have an index on USER_ID and SET_DATE?
2) Your datatype for SET_DATE looks wrong, is it a varchar? Storing it as a date will mean that the db can optimise your search much more efficiently. At the moment the substring method will be called countless times per query as it has to be run for every row returned by the first part of your where clause.
3) Is the group by really required? Unless I'm missing something the 'group by' part of the statement brings nothing to the table ;)
It should make a significant difference if you could store the date either as a date, or in the format you need to make the comparison. Performing a substr() call on every date must be time consuming.
Surely the suggestions with tuning the query would help to improve the query speed. But I think the main point here is what can be done with more than 1 million plus records before session timed out. What if you have like 2 or 3 million records, will some performance tuning solve the problem? I don't think so. So:
1) If you want to display on browser, use pagination and query (for example) the first 100 record.
2) If you want to generate a report (like pdf), then use asynchronous method (JMS)

search resultset by column

I have a problem regarding the Resultset of a large database. (MySQLDB, Java 1.7)
The task is to perform a transformation of all the entries of one column into another database.
(e.g. divide every number by three and write them into another database)
As the database contains about 70 columns and a few million rows, my first approach would have been to get a SELECT * and parse the Resultset by columns.
Unfortunately I found no way to parse it this way, as the designated way intends to go through it row by row (while(rs.next()) {} etc).
I don't like this way, as it would create 70 large arrays, I would have had only one per time to reduce memory usage.
So here are my main questions:
Is there a way?
Should I either create a query for every column and parse them (one array at a time but 70 queries) or
Should I just get the whole ResultSet and parse it row by row, writing them into 70 arrays?
Greetings and thanks in advance!
Why not just page your queries ? Pull out 'n' rows at a time, perform the transformation, and then write them into the new database.
This means you don't pull everything up in one query/iteration and then write the whole lot in one go, and you don't have the inefficiencies of working row-by-row.
My other comment is perhaps this is premature optimisation. Have you tried loading the whole dataset, and seeing how much memory it would take. If it's of the order of 10's or even 100's of megs, I would expect the JVM to handle that easily.
I'm assuming your transformation needs to be done in Java. If you can possibly do it in SQL, then doing it entirely within the database is likely to be even more efficient.
Why don't you do it with mysql only.
use this query :
create table <table_name> as select <column_name_on_which_you_want_transformation>/3 from <table name>;

persisting dynamic properties and query

I have a requirement to implement contact database. This contact database is special in a way that user should be able to dynamically (on runtime) add properties he/she wants to track about the contact. Some of these properties are of type string, other numbers and dates. Some of the properties have pre-defined values, others are free fields etc.. User wants to be also able to query such structure fast and easily. The database needs to handle easily 500 000 contacts each having around 10 properties.
It leads to dynamic property model having Contact class with dynamic properties.
class Contact{
private Map<DynamicProperty, Collection<DynamicValue> values> propertiesAndValues;
//other userfull methods
}
The question is how can I store such a structure in "some database" - it does not have to be RDBMS so that I can easily express queries such as
Get all contacts whose name starts with Martin, they are from Company of size 5000 or less, order by time when this contact was inserted in a database, only first 100 results (provide pagination), where each of these segments correspond to a dynamic property.
I need:
filtering - equal, partial equal, (bigger, smaller for integers, dates) and maybe aggregation - but it is not necessary at this point
sorting
pagination
I was considering RDBMS, but this leads more less to this structure which is quite hard to query and it tends to be slow for this amount of data
contact(id serial pk,....);
dynamic_property(dp_id serial pk, ...);
--only one of the values is not empty
dynamic_property_value(dpv_id serial pk, dynamic_property_fk int, value_integer int, date_value timestamp, text_value text);
contact_properties(pav_id serial pk, contact_id_fk int, dynamic_propert_fk int);
property_and_its_value(pav_id_fk int, dpv_id int);
I consider following options:
store contacts in RDBMS and use Lucene for querying - is there anything that would help with this?
Store dynamic properties as XML and store it to rdbms and use xpath support - unfortunatelly it seems to be pretty slow for 500000 contacts
use another database - Mango DB or Jackrabbit to store this information
Which way would you go and why?
Wikipedia has a great entry on Entity-Attribute-Value modeling which is a data modeling technique for representing entities with arbitrary properties. It's typically used for clinical data, but might apply to your situation as well.
Have you considered using Lucene for your querying needs? You could probably get away with just using Lucene and store all your data in the index. Although I wouldn't recommend using Lucene as your only persistence store.
Alternatively, you could use Lucene along with a RDBMS and take advantage of something like Compass.
You could try other kind of databases like CouchDB which is a document oriented db and is distributed
If you want a dumb solution, for your contacts table you could add some 50 columns like STRING_COLUMN1, STRING_COLUMN2... upto 10, DATE_COLUMN1..DATE_COLUMN10. You have another DESCRIPTION column. So if a row has a name which is a string then STRING_COLUMN1 stores the value of your name and the DESCRIPTION column value would be "STRING_COLUMN1-NAME". In this case querying can be a bit tricky. I know many purists laugh at this, but I have seen a similar requirement solved this way in one of the apps :)

Hibernate displaytag big lists

I'm using displaytag to build tables with data from my db. This works well if the requested list isn't that big but if the list size grows over 2500 entries, fetching the result list takes very long (more than 5 min.). I was wondering if this behavior is normal.
How you handle big list / queries which return big results?
This article links to an example app of how to go about solving the problem. Displaytag expects to be passed a full dataset to create paging links and handle sorting. This kind of breaks the idea of paging externally on the data and fetching only those rows that are asked for (as the user pages to them). The project linked in the article describes how to go about setting this type of thing up.
If you're working with a large database, you could also have a problem executing your query. I assume you have ruled this out. If not, you have the SQL as mentioned earlier - I would run it through the DB2 query analyzer to see if there are any DB bottlenecks. The next step up the chain is to run a test of the Hibernate/DAO call in a unit test without displaytag in the mix. Again, from how you've worded things, it sounds like you've already done this.
The Displaytag hauls and stores everything in the memory (the session). Hibernate also does that. You don't want to have the entire DB table contents at once in memory (however, if the slowdown already begins at 2500 rows, it more look like a matter of badly optimized SQL query / DB table; 2500 rows should be peanuts for a decent DB, but OK, that's another story).
Rather create a HTML table yourself with little help of JSTL c:forEach and a shot of EL. Keep one or two request parameters in the background in input type="hidden": the first row to be displayed (firstrow) and eventually the amount of rows to be displayed at once (rowcount).
Then, in your DAO class just do a SELECT stuff FROM data LIMIT firstrow OFFSET rowcount or something like that depending on the DB used. In MySQL and PostgreSQL you can use the LIMIT and/or OFFSET clause like that. In Oracle you'll need to fire a subquery. In MSSQL and DB2 you'll need to create a SP. You can do that with HQL.
Then, to page through the table, just have a bunch buttons which instructs the server side code to in/decrement the firstrow with rowcount everytime. Just do the math.
Edit: you commented that you're using DB2. I've done a bit research and it appears that you can use the UDB OLAP function ROW_NUMBER() for this:
SELECT id, colA, colB, colC
FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY id) AS row, id, colA, colB, colC
FROM
data
) AS temp_data
WHERE
row BETWEEN 1 AND 10;
This example should return the first 10 rows from the data table. You can parameterize this query so that you can reuse it for every page. This is more efficient than querying the entire table in Java's memory. Also ensure that the table is properly indexed.

Categories