I'am having 6 reports in my web application. All the reports using huge queries around 15 table joins and also query results looping again to do some calculations before present the report to user.
This take long time to load a report. I am using MySQL with Java
What is the best way to fix this issue?
If caching is good what are the available options for that?
I'am planing to create a table and insert all the required data to that table, then reports can access to that table, if it's possible what is the best way to load data to that table?
Can MongoDb or other NoSql DB fix this issue?
Or is there any standard way to do these kind of things?
Use an Explain in front of your query and figure out if your Joins are using any indexes. Hopefully you are usually joining your tables in a similar way, so if you add the right indexes, it should speed it up quite a bit.
For example, if you had,
Select carrots FROM veggies
JOIN fruits ON fruits.color = veggies.color
WHERE veggies.weight=.5
You would want to add an index to the color column in the fruits table, and an index to the weight column of the veggies table.
Related
I have a Java application where we use spring data JPA to query our oracle database. For one use case, I need to fetch all the records present in the table. Now the table has record count of 400,000 thousand and it might grow in the near future. I don't feel comfortable pulling all records into the JVM since we don't know how large they can be. So, I want to configure the code to fetch specific number of records at a time say 50,000 and process before it goes to next 50,000. Is there a way I can achieve this with JPA? I came across this JDBC property that can be used with hibernate hibernate.jdbc.fetch_size. What I am trying to understand is if I use repository.findAll() returning List<Entity>How can a fetch Size work in this case? because List will have all the entities. I was also looking into repository methods returning Stream<>, not sure if I have to use that. Please do suggest. If there can be better solution for this use case?
Thanks
With JPA you can use the Pagination feature, means you tell the Repository how many result should be present at one page. (E.g. 50 000)
For more information follow up here https://www.baeldung.com/jpa-pagination
I have a problem regarding the Resultset of a large database. (MySQLDB, Java 1.7)
The task is to perform a transformation of all the entries of one column into another database.
(e.g. divide every number by three and write them into another database)
As the database contains about 70 columns and a few million rows, my first approach would have been to get a SELECT * and parse the Resultset by columns.
Unfortunately I found no way to parse it this way, as the designated way intends to go through it row by row (while(rs.next()) {} etc).
I don't like this way, as it would create 70 large arrays, I would have had only one per time to reduce memory usage.
So here are my main questions:
Is there a way?
Should I either create a query for every column and parse them (one array at a time but 70 queries) or
Should I just get the whole ResultSet and parse it row by row, writing them into 70 arrays?
Greetings and thanks in advance!
Why not just page your queries ? Pull out 'n' rows at a time, perform the transformation, and then write them into the new database.
This means you don't pull everything up in one query/iteration and then write the whole lot in one go, and you don't have the inefficiencies of working row-by-row.
My other comment is perhaps this is premature optimisation. Have you tried loading the whole dataset, and seeing how much memory it would take. If it's of the order of 10's or even 100's of megs, I would expect the JVM to handle that easily.
I'm assuming your transformation needs to be done in Java. If you can possibly do it in SQL, then doing it entirely within the database is likely to be even more efficient.
Why don't you do it with mysql only.
use this query :
create table <table_name> as select <column_name_on_which_you_want_transformation>/3 from <table name>;
I am playing around (learning experience) with writing an analytic system using the Play! Framework(2)(java),
I want to write efficient code and due to this I am struggling to decide on the following:
For every view a page gets there is a record being added, specifying the website (example.org) , page (/index.html) and the date that was viewed. As you can guess, the amount of rows is going to be huge.
To use the data I am then selecting all rows where the website is "example.org", looping through the results to build a hash map containing the date and how many views it had on that date and then using this to build a graph.
There must be a more better way of doing this,
For example, rather than having a row per view would it be better to update an existing record adding one view to the record.
Any assistance would be appreciated.
Thank you
You can just add some more conditions (like the date) in your WHERE clause, then you can perform a Count over the result. This way you'll have directly the result from your database.
The query would look like:
SELECT COUNT(*)
FROM YOUR_TABLE
WHERE SITE = 'thesite'
AND DATE = '<date>'
GROUP BY SITE, DATE
There must be a better way of doing this,
The web server logs HTML requests. Most analytical systems use the web server logs.
Since you mentioned that you're doing this to learn, you're gathering statistics in the most flexible way possible.
My only suggestion would be to remove all indexes from the statistics table that your web applications are writing to. Make a copy of the statistics table for generating the statistics. The copy would have all of the necessary indexes.
This way, you get the fastest writes, because there are no indexes to update.
If necessary, you can have a primary index or clustering index on the write table.
I have a customer with a very small set of data and records that I'd normally just serialize to a data file and be done but they want to run extra reports and have expandability down the road to do things their own way. The MySQL database came up and so I'm adapting their Java POS (point of sale) system to work with it.
I've done this before and here was my approach in a nutshell for one of the tables, say Customers:
I setup a loop to store the primary key into an arraylist then setup a form to go from one record to the next running SQL queries based on the PK. The query would pull down the fname, lname, address, etc. and fill in the fields on the screen.
I thought it might be a little clunky running a SQL query each time they click Next. So I'm looking for another approach to this problem. Any help is appreciated! I don't need exact code or anything, just some concepts will do fine
Thanks!
I would say the solution you suggest yourself is not very good not only because you run SQL query every time a button is pressed, but also because you are iterating over primary keys, which probably are not sorted in any meaningful order...
What you want is to retrieve a certain number of records which are sorted sensibly (by first/last name or something) and keep them as a kind of cache in your ArrayList or something similar... This can be done quite easily with SQL. When the user starts iterating over the results by pressing "Next", you can in the background start loading more records.
The key to keep usability is to load some records before the user actually request them to keep latency small, but keeping in mind that you also don't want to load the whole database at once....
Take a look at indexing your database. http://www.informit.com/articles/article.aspx?p=377652
Use JPA with the built in Hibernate provider. If you are not familiar with one or both, then download NetBeans - it includes a very easy to follow tutorial you can use to get up to speed. Managing lists of objects is trivial with the new JPA and you won't find yourself reinventing the wheel.
the key concept here is pagination.
Let's say you set your page size to 10. This means you select 10 records from the database, in a certain order, so your query should have an order by clause and a limit clause at the end. You use this resultset to display the form while the users navigates with Previous/Next buttons.
When the user navigates out of the page then you fetch an other page.
https://www.google.com/search?q=java+sql+pagination
I'm using displaytag to build tables with data from my db. This works well if the requested list isn't that big but if the list size grows over 2500 entries, fetching the result list takes very long (more than 5 min.). I was wondering if this behavior is normal.
How you handle big list / queries which return big results?
This article links to an example app of how to go about solving the problem. Displaytag expects to be passed a full dataset to create paging links and handle sorting. This kind of breaks the idea of paging externally on the data and fetching only those rows that are asked for (as the user pages to them). The project linked in the article describes how to go about setting this type of thing up.
If you're working with a large database, you could also have a problem executing your query. I assume you have ruled this out. If not, you have the SQL as mentioned earlier - I would run it through the DB2 query analyzer to see if there are any DB bottlenecks. The next step up the chain is to run a test of the Hibernate/DAO call in a unit test without displaytag in the mix. Again, from how you've worded things, it sounds like you've already done this.
The Displaytag hauls and stores everything in the memory (the session). Hibernate also does that. You don't want to have the entire DB table contents at once in memory (however, if the slowdown already begins at 2500 rows, it more look like a matter of badly optimized SQL query / DB table; 2500 rows should be peanuts for a decent DB, but OK, that's another story).
Rather create a HTML table yourself with little help of JSTL c:forEach and a shot of EL. Keep one or two request parameters in the background in input type="hidden": the first row to be displayed (firstrow) and eventually the amount of rows to be displayed at once (rowcount).
Then, in your DAO class just do a SELECT stuff FROM data LIMIT firstrow OFFSET rowcount or something like that depending on the DB used. In MySQL and PostgreSQL you can use the LIMIT and/or OFFSET clause like that. In Oracle you'll need to fire a subquery. In MSSQL and DB2 you'll need to create a SP. You can do that with HQL.
Then, to page through the table, just have a bunch buttons which instructs the server side code to in/decrement the firstrow with rowcount everytime. Just do the math.
Edit: you commented that you're using DB2. I've done a bit research and it appears that you can use the UDB OLAP function ROW_NUMBER() for this:
SELECT id, colA, colB, colC
FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY id) AS row, id, colA, colB, colC
FROM
data
) AS temp_data
WHERE
row BETWEEN 1 AND 10;
This example should return the first 10 rows from the data table. You can parameterize this query so that you can reuse it for every page. This is more efficient than querying the entire table in Java's memory. Also ensure that the table is properly indexed.