This is not the rare question on the net, but I does a few optimization work with MySQL server for solve this problem and did not get results. So at first I use maven's package mysql:mysql-connector-java:6.0.6.
I try just to run this code:
try {
mysqlConnection = DriverManager.getConnection(DatabaseUtils.mysqlUrl, DatabaseUtils.mysqlUser, DatabaseUtils.mysqlPassword);
PreparedStatement valuesStatement = "SELECT * FROM `test` ORDER BY `id`"
ResultSet cursor = valuesStatement.executeQuery();
double value = 0;
if (cursor.next())
value = cursor.getDouble("value");
} catch (SQLException sqlEx) {
sqlEx.printStackTrace();
} finally {
cursor.close();
pricesStatement.close();
}
I have a lot records in the table. It's about million but every day add about thousand records. So I was very surprised when this simple example executed 30 seconds. I googled my problem and I find only "using pool", "tune mysql server", "try to EXPLAIN SELECT". But I've noticed that execution time related with rows count. So I looked into driver's code and found that:
TextResultsetReader::read():
while(true) {
if(row == null) {
rows = new ResultsetRowsStatic(rowList, cdef);
break;
}
if(maxRows == -1 || rowList.size() < maxRows) {
rowList.add(row);
}
row = (ResultsetRow)this.protocol.read(ResultsetRow.class, trf);
}
This means that even if I want to fetch only one row driver fetches all queried rows and get me first of it. Manuals suggest to use "setFetchSize" for fetching only n records. But it doesn't work. Driver code fetching all data anyway. So then I found that there is two recordsets: ResultRowsStatic and ResultSetStreaming. Second seems to be fetching data only when I need query it. How to use ResultRowsStreaming? I found it only into code. Parameter "fetchSize" must to equal -2147483648. I did try and it worked! Execution time of "executeQuery()" now if about 0.0007 sec. It's very fast for me. But wait.. My script anyway takes 30 seconds. Why? I debugged code after executing query. There's only two "close" methods after that. What's can go wrong? And that's true, "cursor.close()" takes the rest of time. I looked into library code again and reached ResultsetRowsStreaming::close():
boolean hadMore = false;
int howMuchMore = 0;
synchronized(mutex) {
while(this.next() != null) {
hadMore = true;
++howMuchMore;
if(howMuchMore % 100 == 0) {
Thread.yield();
}
}
if(conn != null) {
if(!((Boolean)this.protocol.getPropertySet().getBooleanReadableProperty("clobberStreamingResults").getValue()).booleanValue() && ((Integer)this.protocol.getPropertySet().getIntegerReadableProperty("netTimeoutForStreamingResults").getValue()).intValue() > 0) {
int oldValue = this.protocol.getServerSession().getServerVariable("net_write_timeout", 60);
this.protocol.clearInputStream();
try {
this.protocol.sqlQueryDirect((StatementImpl)null, "SET net_write_timeout=" + oldValue, (String)this.protocol.getPropertySet().getStringReadableProperty("characterEncoding").getValue(), (PacketPayload)null, -1, false, (String)null, (ColumnDefinition)null, (GetProfilerEventHandlerInstanceFunction)null, this.resultSetFactory);
} catch (Exception var9) {
throw ExceptionFactory.createException(var9.getMessage(), var9, this.exceptionInterceptor);
}
}
if(((Boolean)this.protocol.getPropertySet().getBooleanReadableProperty("useUsageAdvisor").getValue()).booleanValue() && hadMore) {
ProfilerEventHandler eventSink = ProfilerEventHandlerFactory.getInstance(conn.getSession());
eventSink.consumeEvent(new ProfilerEventImpl(0, "", this.owner.getCurrentCatalog(), this.owner.getConnectionId(), this.owner.getOwningStatementId(), -1, System.currentTimeMillis(), 0L, Constants.MILLIS_I18N, (String)null, (String)null, Messages.getString("RowDataDynamic.2") + howMuchMore + Messages.getString("RowDataDynamic.3") + Messages.getString("RowDataDynamic.4") + Messages.getString("RowDataDynamic.5") + Messages.getString("RowDataDynamic.6") + this.owner.getPointOfOrigin()));
}
}
}
This code unconditionally fetching all the rest of data only for logging how many records I did not fetched. Really weird. And it would be justified if logger was attached. But in my case this code counting unfetched rows and in 30 seconds and... do nothing with it. And this proble I cannot fix because there's not parameter which can tell code not to count rows.
Now I don't know what to do next. Query time is very slow for me. For example mysql driver for php execute this query in 0.0004-0.001 seconds.
So people who using mysql-connector for Java, tell me please have you got these problems? If not, could you post any examples what should I do to bypass the above problems? Maybe you use another connectors. So tell me please, what to do?
Your SQL query says
SELECT * FROM test ORDER BY id
You are, with that query, instructing your MySQL server to serialize every column of every row of your test table and send it to your Java program. So, MySQL obeys. You have a large table. So your instruction to MySQL takes time. And yes, the more rows in your table the longer it takes. This is not a problem with JDBC or the driver; it's a problem with the SQL you're using.
It seems from your sample code that you want one column -- named value -- from one row -- the first one -- in your table. You could accomplish that using this SQL statement:
SELECT value FROM test ORDER BY id LIMIT 1
If your id column is your table's primary key, this will be fast.
The whole point of SQL is to allow your tables to contain so many rows that it's unreasonable to fetch them all into your Java (or other) program in a short amount of time. That's why SQL has WHERE and LIMIT clauses.
Related
I need you expertise on performance bottlenecks/improvements with the following code.
I have a huge collection (~2.5 million objects) of INTEREST_RATES to traverse repeatedly and fetch and return lists of fitting entries. My current solution to do this is a HSQL memory database:
INTEREST_RATE table structure:
CREATE MEMORY TABLE INTEREST_RATES " +
"(EFFECTIVE_DATE DATE not NULL, "
+ "INTEREST_RATE DOUBLE not NULL, "
+ "INTEREST_RATE_CD INT not NULL, "
+ "INTEREST_RATE_TERM INT not NULL, "
+ "INTEREST_RATE_TERM_MULT VARCHAR(5) not NULL,"
+ "TERM_IN_DAYS DOUBLE not NULL,"
+ "PRIMARY KEY (EFFECTIVE_DATE, INTEREST_RATE_CD, INTEREST_RATE_TERM, INTEREST_RATE_TERM_MULT))"
CREATE INDEX dtidx ON INTEREST_RATES (EFFECTIVE_DATE, INTEREST_RATE_CD)
Query:
SELECT * from INTEREST_RATES where INTEREST_RATE_CD = ? and
EFFECTIVE_DATE = (SELECT MAX(EFFECTIVE_DATE) from INTEREST_RATES
where INTEREST_RATE_CD = ? AND EFFECTIVE_DATE <= ?)
--> So, I am trying to fetch the latest available RATES for a specific INTEREST_RATE_CD, giving an upper date limit.
Java part to execute the query:
PreparedStatement p = con.prepareStatement(sql);
p.setLong(1, intRateCd);
p.setLong(2, intRateCd);
p.setDate(3, someDate);
ResultSet r = p.executeQuery();
return resultSetToList(r);
Java main loop using Futures/multithreading:
ExecutorService executor = Executors.newFixedThreadPool(4);
CompletionService<TestResult> completionService = new ExecutorCompletionService<>(executor);
long futureCount = 0;
while(deals.next()) //deals is a ScrollableResults set from Hibernate
{
IDealEntity deal = (IDealEntity) deals.get()[0];
//These tasks contain the INTEREST_RATE query action
QueryTask task = new QueryTask(some params...);
completionService.submit(task);
}
try
{
while(futureCount < dealCount)
{
Future<TestResult> result = completionService.take();
TestResult testResult = result.get();
futureCount++;
testResults.add(testResult);
}
executor.shutdown();
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS);
}
catch (Exception ex)
{
ex.printStackTrace();
}
Now,
as I try to improve the performance or find mistakes in my code,
my questions would be:
Could you come up with anything faster than an inmem db to repeatedly fetch objects following the query logic? Is there any better / faster / whatever data structure?
So far, HSQL was the fastest thing I could come up with. Also tried H2, which was waaaaay slower.
Interestingly enough, my experiment using multithreading and ExecutorService doesn't really change anything performance wise.
It's nearly no difference if I use a 1 sized ThreadPool or 4 threads...
Any tipps or ideas or anything is welcome!
I don't think that in memory database is good approach to solve it. Most important is to avoid full table scans. I seems to me that you have the indexes right. It would be useful to see real timings which should be milliseconds.
If this is not enough you could load whole structure in memory as nested indexed collections or hash tables and use java to directly traverse these directly.
My views are that when we deal with very huge amount of data then in-memory database could create problems as it will consume very large memory unless using distributed in-memory database.
Another alternative, if not using distributed in-memory database, could be to use Cache with well suited eviction policies etc.
I am trying to profile neo4j database hits using the following code
public int calculateHits(List<ExecutionPlanDescription> list) {
int hits = 0;
int head = 0;
if (list.isEmpty())
return 0;
if (list.get(head).hasProfilerStatistics()) {
hits += list.get(head).getProfilerStatistics().getDbHits();
System.out.println(hits);
}
hits += calculateHits(list.get(head).getChildren()); // recurse over the children of the head
list.remove(head); // remove the head to recurse on the remaining of the list
hits += calculateHits(list);
return hits;
}
in main I call it this way
Result result = neo4jGraph.execute(query);
int hits = calculateHits(result.getExecutionPlanDescription().getChildren());
However, the method always returns 0 hits. I logged the names of queryExecuter plans and found EagerAggregation, Filter. Expand(All), Filter, and NodeByLabelScan plans. But seems that profilerStatistics do not exist for all the plans as it never accesses the condition and increases the hits.
Is there any problem in the code or I need to make certain configuration first to profile the DB hits? ... appreciate your help. Thanks!
I figured out the problem finally! I have to use profile in the query to fill the statistics and get the DB hits. so the query should be
PROFILE Match (e:Event)-[r:has_metadata]-> (s:EventMetadata) where s.type STARTS WITH 'ELec' AND s.eventLocation IN ["GW", "GW32", "FW", "FW29" , "SW", "SW00"] AND e.date="1/11/2016" return SUM( e.reading)}
there's column I want to retrieve and insert into another table
For example, below is first table I want to retrieve values
Table1
Records
1 ABC Singapore
2 DEF Vietnam
I retrieve above column value from Table1, then insert into another table as below
Table 2
ID Name Country
1 ABC Singapore
2 DEF Vietname
Currently, I can do with java, I first retrieve records then split the values and insert. However, I want to do it by batch or pagination for better performance when Table1 got million of records to retrieve and insert those million records into Table2.
Any pointer to show me how to use pagination in my case would be appreciated.
I"m use MSSQL 2008
If you need to do that in code (and not in SQL which should be easier even with multiple delimiters), what you probably want to use would be batched inserts with proper batch size combined with a good fetch-size on your select:
//Prepare statements first
try(PreparedStatement select = con.prepareStatement("SELECT * FROM SOURCE_TABLE");
PreparedStatement insert = con.prepareStatement("INSERT INTO TARGET_TABLE(col1, col2, col3) VALUES (?,?,?)")) {
//Define Parameters for SELECT
select.setFetchDirection(ResultSet.FETCH_FORWARD);
select.setFetchSize(10000);
int rowCnt = 0;
try(ResultSet rs = select.executeQuery()) {
while(rs.next()) {
String row = rs.getString(1);
String[] split = row.split(" |\\$|\\*"); //However you want to do that
//Todo: Error handling for array length
//Todo: Type-Conversions, if target data is not a string type
insert.setString(1, split[0]);
insert.setString(2, split[1]);
insert.setString(3, split[2]);
insert.addBatch();
//Submit insert in batches of a good size:
if(++rowCnt % 10000 == 0) {
int[] success = insert.executeBatch();
//Todo: Check if that worked.
}
}
//Handle remaining inserts
int[] success = insert.executeBatch();
//Todo: Check if that worked.
}
} catch(SQLException e) {
//Handle your Exceptions
}
On calculating on "good" fetch and batch sizes you'll want to consider some parameters:
Fetchsize impacts memory consumption in your client. If you have enough of that you can make it big.
Committing an insert of millions of rows will take some time. Depending on your requirements you might want to commit the insert transaction every once in a while (every 250.000 inserts?)
Think about your transaction isolation: Make sure auto-commit is turned off as committing each insert will make most of the batching gains go away.
I'm having a problem with a java OutOfMemoryError. The program basically looks at mysql tables that are running on mysql workbench, and queries them to get out certain information, and then puts them in CSV files.
The program works just fine with a smaller data set, but once I use a larger data set (hours of logging information as opposed to perhaps 40 minutes) I get this error, which to me says that the problem comes from having a huge data set and the information not being handled too well by the program. Or it not being possible to handle this amount of data in the way that I have.
Setting Java VM arguments to -xmx1024m worked for a slightly larger data set but i need it to handle even bigger ones but it gives the error.
Here is the method which I am quite sure is the cause of the program somewhere:
// CSV is csvwriter (external lib), sment are Statements, rs is a ResultSet
public void pidsforlog() throws IOException
{
String[] procs;
int count = 0;
String temp = "";
System.out.println("Commence getting PID's out of Log");
try {
sment = con.createStatement();
sment2 = con.createStatement();
String query1a = "SELECT * FROM log, cpuinfo, memoryinfo";
rs = sment.executeQuery(query1a);
procs = new String[countThrough(rs)];
// SIMPLY GETS UNIQUE PROCESSES OUT OF TABLES AND STORES IN ARRAY
while (rs.next()) {
temp = rs.getString("Process");
if(Arrays.asList(procs).contains(temp)) {
} else {
procs[count] = temp;
count++;
}
}
// BELIEVE THE PROBLEM LIES BELOW HERE. SIZE OF THE RESULTSET TOO BIG?
for(int i = 0; i < procs.length; i++) {
if(procs[i] == null) {
} else {
String query = "SELECT DISTINCT * FROM log, cpuinfo, memoryinfo WHERE log.Process = " + "'" + procs[i] + "'" + " AND cpuinfo.Process = " + "'" + procs[i] + "'" + " AND memoryinfo.Process = " + "'" + procs[i] + "' AND log.Timestamp = cpuinfo.Timestamp = memoryinfo.Timestamp";
System.out.println(query);
rs = sment.executeQuery(query);
writer = new CSVWriter(new FileWriter(procs[i] + ".csv"), ',');
writer.writeAll(rs, true);
writer.flush();
}
}
writer.close();
} catch (SQLException e) {
notify("Error pidslog", e);
}
}; // end of method
Please feel free to ask if you want source code or more information as I'm desperate to get this fixed!
Thanks.
SELECT * FROM log, cpuinfo, memoryinfo will sure give a huge result set. It will give a cartesian product of all rows in all 3 tables.
Without seeing the table structure (or knowing the desired result) it's hard to pinpoint a solution, but I suspect that you either want some kind of join conditions to limit the result set, or use a UNION a'la;
SELECT Process FROM log
UNION
SELECT Process FROM cpuinfo
UNION
SELECT Process FROM memoryinfo
...which will just give you all distinct values for Process in all 3 tables.
Your second SQL statement also looks a bit strange;
SELECT DISTINCT *
FROM log, cpuinfo, memoryinfo
WHERE log.Process = #param1
AND cpuinfo.Process = #param1
AND memoryinfo.Process = #param1
AND log.Timestamp = cpuinfo.Timestamp = memoryinfo.Timestamp
Looks like you're trying to select from all 3 logs simultaneously, but ending up with another cartesian product. Are you sure you're getting the result set you're expecting?
You could limit the result returned by your SQL queryes with the LIMIT estatementet.
For example:
SELECT * FROM `your_table` LIMIT 100
This will return the first 100 results
SELECT * FROM `your_table` LIMIT 100, 200
This will return results from 100 to 200
Obviously you can iterate with those values so you get to all the elements on the data base no matter how many there are.
I think your are loading too many data at the same in the memory. try to use offset and limit in your sql statement so that you can avoid this problem
Your Java code is doing things that the database could do more efficiently. From query1a, it looks like all you really want is the unique processes. select distinct Process from ... should be sufficient to do that.
Then, think carefully about what table or tables are needed in that query. Do you really need log, cpuinfo, and memoryinfo? As Joachim Isaksson mentioned, this is going to return the Cartesian product of those three tables, giving you x * y * z rows (where x, y, and z are the row counts in each of those three tables) and a + b + c columns (where a, b, and c are the column counts in each of the tables). I doubt that's what you want or need. I assume you could get those unique processes from one table, or a union (rather than join) of the three tables.
Lastly, your second loop and query are essentially doing a join, something again better and more efficiently left to the database.
Like others said, fetching the data in smaller chunks might resolve the issue.
This is one of the other threads in stackoverflow that talks about this issue:
How to read all rows from huge table?
I would like to make a java code that has different outcomes that are determined by data inside of a MySQL column. I have everything set up and I can connect to the database and view data. I don't know how I would use "If" with a mysql column.
Here is my code:
http://pastebin.com/UsJC7Qzx
What I'm trying to do specifically: I want to make the code print "Thanks for voting" if the MySQL column "given" is equal to 0 and then it will set the column to 1. And if the column is equal to 1 it will say "Thanks again for voting."
This is just a simple base for a voting reward system I'm doing for my video game.
If you don't have very good understanding of what I am trying to say read my notes inside of the code.
It would look something like this:
while (given.next()) {
if (given.getInt("given") > 0) {
System.out.println("Thanks again for voting");
} else {
System.out.println("Thanks for voting");
}
}
Would recommend that you rename the given resultset to something like say 'resultSet'.
I think you're almost there. Just need a couple of more lines.
st.executeQuery(give); returns a ResultSet. If you are guaranteed that your query will only return one result, you could simply do
ResultSet result = st.executeQuery(give);
if ( result.next() ) { // advances the resultset to the first result.
int actualVal = given.getInt('given');
if ( actualVal == 0 ) {
System.out.println("Thanks for voting");
// do the update here
st.executeUpdate("update has_voted set given = 1 where ...........");
}
else
System.out.println("Thanks again for voting");
}