Currently, I have a table being read from like so:
ps = con.prepareStatement("SELECT * FROM testrow;");
rs = ps.executeQuery();
rs.next();
String[] skills = rs.getString("skills").split(";");
String[] skillInfo;
for (int i = 0; i < skills.length; i++) {
skillInfo = skills[i].split(",");
newid.add(Integer.parseInt(skillInfo[0]));
newspamt.add(Byte.parseByte(skillInfo[1]));
mastery.add(Byte.parseByte(skillInfo[2]));
}
rs.close();
ps.close();
The information is saved to the database by using StringBuilder to form a string of all the numbers that need to be stored, which would be in the format of number1,number2,number3;
I had written a test project to see if that method would be faster than using MySQL's batch method, and it beat MySQL by roughly 3 seconds. The only problem I'm facing now is when I go to read the information, MySQL completes the job in a few milliseconds, where as calling the information using String[] to split the data by the ";" character, and then also using String[] to split information within a loop by the "," character, takes about 3 to 5 seconds.
Is there anyway I can reduce the amount of time it takes to load the data using the String[], or possibly another method?
Do not store serialized arrays in database fields. Use 3NF?
Do you read the information more often than you write it ? If so (most likely) then optimising the write seems to be emphasising the wrong end of the operation. Why not store the info in separate columns and thus avoid splitting (i.e. normalise your data)?
If you can't do that, can you load the data in one thread, and hand off to another thread for splitting/storing the info. i.e. you read the data in one thread, and for each line, pass it through (say) a BlockingQueue to another thread that splits/stores.
in the format of number1,number2,number3
consider normalising the table, giving one number per row.
String.split uses a regular expression for its algorithm. I'm not how it's implemented, but the chance is that is quite cpu heavy. Try implementing your own split method, using a char value instead of a regular expression.
Drop the index while inserting, that'll make it faster.
Of course this is only an option for a batch load, not for 500-per-second transactions.
The most obvious alternative method is to have a separate skills table with rows and columns instead of a single field of concatenated values. I know it looks like a big change if you've already got data to migrate but it's worth the effort for so many reasons.
I recommend that instead of using the split method, you use a precompiled regular expression, especially in the loop.
Related
Is there any difference in the performance of insert() vs append() from StringBuilder class? I will be building plenty of short string as text identifiers and asked myself this question... Should I initialize SB with a separator and use insert + append or just append ?
Knowing that:
An insert at the end of the string representation is equivalent to an append in term of time complexity (O(n)).
An insert anywhere else than at the end can't be obtained with an append (as they have differents purposes).
For info, an insert may involve up to 3 System.arraycopy (native) calls, while an append 1.
You can easily conclude:
If you want to insert at the end of the string representation, use append
Otherwise, use insert
Doing so, you will have the best performance. But again, these two methods serving two differents purposes (with the exception of inserting at the end), there is no real question here.
They have different functionalities and different complexities,
insert:
(ensures The Capacity of the backing array, needs to copy the old one if necessary)
pushes the elements leading the item at the insertion index (offset)
Where append:
(ensures The Capacity of the backing array, needs to copy the old one if necessary)
adds the new element to the tail of the array
So if you want to always add to the tail, then the performance will be the same since insert will not push any elements.
So, I would use append, it is just cleaner.
According to Java API docs. You must provide offset if you use insert.
StringBuilder.insert(5, "String");
but StringBuilder.append("string") doesn't. I assume append has a better performance than insert.
I have a table which I need to query, then organize the returned objects into two different lists based on a column value. I can either query the table once, retrieving the column by which I would differentiate the objects and arrange them by looping through the result set, or I can query twice with two different conditions and avoid the sorting process. Which method is generally better practice?
MY_TABLE
NAME AGE TYPE
John 25 A
Sarah 30 B
Rick 22 A
Susan 43 B
Either SELECT * FROM MY_TABLE, then sort in code based on returned types, or
SELECT NAME, AGE FROM MY_TABLE WHERE TYPE = 'A' followed by
SELECT NAME, AGE FROM MY_TABLE WHERE TYPE = 'B'
Logically, a DB query from a Java code will be more expensive than a loop within the code because querying the DB involves several steps such as connecting to DB, creating the SQL query, firing the query and getting the results back.
Besides, something can go wrong between firing the first and second query.
With an optimized single query and looping with the code, you can save a lot of time than firing two queries.
In your case, you can sort in the query itself if it helps:
SELECT * FROM MY_TABLE ORDER BY TYPE
In future if there are more types added to your table, you need not fire an additional query to retrieve it.
It is heavily dependant on the context. If each list is really huge, I would let the database to the hard part of the job with 2 queries. At the opposite, in a web application using a farm of application servers and a central database I would use one single query.
For the general use case, IMHO, I will save database resource because it is a current point of congestion and use only only query.
The only objective argument I can find is that the splitting of the list occurs in memory with a hyper simple algorithm and in a single JVM, where each query requires a bit of initialization and may involve disk access or loading of index pages.
In general, one query performs better.
Also, with issuing two queries you can potentially get inconsistent results (which may be fixed with higher transaction isolation level though ).
In any case I believe you still need to iterate through resultset (either directly or by using framework's methods that return collections).
From the database point of view, you optimally have exactly one statement that fetches exactly everything you need and nothing else. Therefore, your first option is better. But don't generalize that answer in way that makes you query more data than needed. It's a common mistake for beginners to select all rows from a table (no where clause) and do the filtering in code instead of letting the database do its job.
It also depends on your dataset volume, for instance if you have a large data set, doing a select * without any condition might take some time, but if you have an index on your 'TYPE' column, then adding a where clause will reduce the time taken to execute the query. If you are dealing with a small data set, then doing a select * followed with your logic in the java code is a better approach
There are four main bottlenecks involved in querying a database.
The query itself - how long the query takes to execute on the server depends on indexes, table sizes etc.
The data volume of the results - there could be hundreds of columns or huge fields and all this data must be serialised and transported across the network to your client.
The processing of the data - java must walk the query results gathering the data it wants.
Maintaining the query - it takes manpower to maintain queries, simple ones cost little but complex ones can be a nightmare.
By careful consideration it should be possible to work out a balance between all four of these factors - it is unlikely that you will get the right answer without doing so.
You can query by two conditions:
SELECT * FROM MY_TABLE WHERE TYPE = 'A' OR TYPE = 'B'
This will do both for you at once, and if you want them sorted, you could do the same, but just add an order by keyword:
SELECT * FROM MY_TABLE WHERE TYPE = 'A' OR TYPE = 'B' ORDER BY TYPE ASC
This will sort the results by type, in ascending order.
EDIT:
I didn't notice that originally you wanted two different lists. In that case, you could just do this query, and then find the index where the type changes from 'A' to 'B' and copy the data into two arrays.
In my application i have result set containing more than 20000 rows. I want to save it to an Array List. I am using the below code for this.
Class.forName("com.mysql.jdbc.Driver") ;
Connection conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/Data", "root", "root") ;
Statement stmt = conn.createStatement() ;
String query = "select * from temp ;" ;
ResultSet rs = stmt.executeQuery(query) ;
ArrayList<String> varList = new ArrayList<String>();
while(rs.next()){
varList.add(rs.getString(1));
}
When i use the query it takes more time to fetch the data from the resultset and too slow if the table contains more than 20000 entries. How can this be solved ? Any suggestions will be very greatful.
I strongly suspect that you're trying to solve the wrong problem. Whatever you're doing with those 20,000 rows, you should do it at the database and only return the results of the operation (which, hopefully, will take much less space) to the client.
If you told us exactly what you're trying to do with the data, we might be able to offer more specific suggestions on how to do that.
(Alternatively, if you really do need all the 20,000 rows at the client, you might want to skip the database entirely and just store them at the client.)
20000 is not a big number in terms of database rows.
You problem could be in ArrayList.
By default, an ArrayList has a default size, I'm not sure how many, say 100. And when the inserted items are more than that number, Java will create a new arraylist with size incremented by some value, I'm not quite sure how many, say 100 also. Then the content in previous list will be copied to the new list. These are all string operations in your case. So you can see why it is slow. Do the following may solve your problem.
int rsSize = getResultSetSize(connection,query); //return the size of the result set first
ArrayList<String> varList = new ArrayList<String>(rsSize);
I suspect that your issue isn't the database, it's likely the catenation of items to the list. You should provide java with more memory. If it is the list.add then you could speed that up by allocating an array the size of the result and then using indexing to insert the data into the array.
What you want to do with those 20000 records after fetching it.? What ever you want to do see if you can do it at DB level using PL-SQL code. Many programmers don't use power of PL-SQL. Whatever you can do with other high level programming language, most of things among these can be done with PL-SQL too.
When I am executing a SQLite query (using sqlite4java) I am following the general scheme of executing it step by step and obtaining one row at a time. My final result should be a 2-D array, whose length should correspond to the amount of records. The problem I am facing is the fact that I don't know in advance how many records are to be returned by my query. so I basically store them in an ArrayList and then copy pointers to the actual array. Is there a technique to somehow obtain the number of records to be returned by the query prior to executing it fully?
My final result should be a 2-D array, whose length should correspond to the amount of records.
Why? It would generally be a better idea to make the result a List<E> where E is some custom type representing "a record".
It sounds like you're already creating an ArrayList - so why do you need an actual array? The Collections API is generally more flexible and convenient than using arrays directly.
No, if using JDBC. You can, however, first do a COUNT() query, and then the real query.
In my application I have to fetch records and need to put them in to 2D array. I have to fire two queries first to find out the count so that I can initialize the array and second is to fetch the data. It results in performance hit. I need solution to improve the performance.
Thanks.
I have to fire two queries first to
find out the count so that I can
initialize the array and second is to
fetch the data.
You can combine your 2 queries as:
select *,(select count(*) from table) as counting from table;
Also consider using a suitable Collection, such as List<List<Object>>. For improved type-safety, consider using Class Literals as Runtime-Type Tokens; the query example is near the bottom.