I have a ton of raw html files that I'm parsing and inserting to a MySQL database via a connection in Java.
I'm using "REPLACE INTO" statements and this method:
public void migrate(SomeThread thread) throws Exception{
PreparedStatement threadStatement = SQL.prepareStatement(threadQuery);
thread.prepareThreadStatement(threadStatement);
threadStatement.executeUpdate();
threadStatement.close();
for(SomeThread.Post P : thread.threadPosts){
PreparedStatement postStatement = SQL.prepareStatement(postQuery);
P.preparePostStatement(postStatement);
postStatement.executeUpdate();
postStatement.close();
}
}
I am running 3 separate instances of my program each in its own command prompt, with their own separate directory of htmls to parse and commit.
I'm using HeidiSQL to monitor the database and a funny thing is happening where I'll see that I have 500,000 rows in a table at one point for example, then I'll close HeidiSQL and check back later to find that I now have 440,000 rows. The same thing occurs for the two tables that I'm using.
Both of my tables use a primary key called "id", each of their ID's have their own domain but it's possible their values overlap and are overwriting each other? I'm not sure if this could be an issue because I'd think SQL would differentiate between the table's "local" id values.
Otherwise I was thinking it could be that since I'm running 3 separate instances that each have their connection to the DB, some kind of magic is happening where right as one row is being committed, the execution swaps to another commit statement, displaces the table, then back to the first commit and then some more magic that causes the database to roll back the number of rows collected.
I'm pretty new to SQL so I'm not too sure where to start, if somebody has an idea about what the heck is going on and could point me in the right direction I'd really appreciate it.
Thanks
You might want to use INSERT INTO instead of REPLACE INTO.
Data doesn't disappear.
Here are some tips:
Do you have another thread running that actually deletes entries?
Do other people have access to the database?
Not sure what HeidiSQL may do. To exclude that possibility maybe use MySQL Workbench instead.
Yeah now that I run a COUNT(*) query against my tables I see that all my rows are in fact there.
Most likely the heidiSQL summary page is just a very rough estimate.
Thanks for the suggestion to use workbench pete, I will try it and see if it is better than Heidi as Heidi is freezing up on me on a regular basis.
Related
i need a little help here because i'm struggling a little bit to find the best solution for my problem. i googled and dont have any enlightening answer.
So, first of all, i'll explain the idea.
1 - i've a java application that insert data in my database (Oracle DB) using jdbc.
2 - My database is logically splited in two. One part that contains table with exported information (from another application) and another part with table that represents some reports.
3 - my java app only insert information in export table.
4 - I've developed some packages that makes the transformation of data from export table to report table (generate some reports).
5 - This packages are scheduled to execute 2, 3 times a day
So, my problem is that when transformation task starts, i want to prevent new DML operations. Then, when transformation stops, all new data that was supposed to be inserted/updated during that time, shall be inserted again in the export tables.
i tought in two approaches:
1 - during transformation time deviate the DML ops to temporary table
2 - lock the tables but i've not so many experience using this. My main question is, can i force DML operations in jdbc to wait until the lock is finished? Not tried yet, but read here and there that after some that is thrown a lockwaittimeout exception or something like that.
Can anyone more experienced give me some advices?
Any doubts on what i'm trying to do just ask.
Do not try locking tables as a solution. Sadly, that is common but rarely necessary. Just a few ideas:
at start of transformation select * data from export table into global_temp table. Then execute your transformation packages on that temp table
create a materialized view like select * data from export table. Investigate the options to refresh on commit but it seems you require to refresh the table just before your transformation
analyze your exported data. If it is like many other cases most of the data will never change once imported. Only new data needs to be analyzed. To aid in processing add a timestamp field called date_last_modified and a trigger on the table. When a row is updated then update the date_last_modified. This allows you to choose the smallest data set possible of "only changed records"
you should also investigate using bulk collect to optimize your cursor. This will allow you get a group of records all at once, sort of a snapshot of the data at a point in time
I believe you are over thinking this. If you get a group of records one at a time then Oracle will get the state of the record as of the last commit by any user. If you bulk collect a group of records they go into memory and will, again, represent the state as of a point in time.
The best way to feel more comfortable about this is to set up a test case. Set up a cursor that sleeps during every processing cycle. Open another session and change the data that is being processed. See what happens....
What I mean is I have a program that executes inserts in batches of 100k. Each one of these inserts is assigned a new ID from a sequence on insert. I want to keep the batch process for obvious reasons, but I also need to then pull out each ID as it is created and do things with it before I move on to the next insert. Is there a way to do this?
Things work differently in PostgreSQL than MySQL. First you have to write your insert as:
INSERT INTO foo (...) VALUES (...)
RETURNING id;
The RETURNING id is important as that tells the insert statement to return something. Then you should be able to pull back the id as what you could expect from a select statement.
I am not quite sure how the JDBC driver for PostgreSQl implements this regarding batch processing though. If you have to, you could probably modify this to store the id in a temporary table or something that you could query from after.
I have searched the web for simple examples to this but to no avail. I need to run a select and insert operation as an atomic unit in Java, using JDBC against an Oracle database.
Effectively I need to do the following:
Select code from users
Go through all codes until I find one that is not used (as users can be deleted there may be codes available in the middle of the range)
Insert new user with that available code
This is a simple operation normally, but as my application is multi-threaded I'm not sure how to go about this. As concurrent threads running at the same time could both try and insert using the same value for code.
There are a couple workarounds or hacks that I can think of to do the job but in general how can I lock the table to make this operation atomic? Most of what I've seen involves row locks but as I'm not updating I don't see how this applies.
This is a tough problem to do entirely in SQL. Any solution is going to have race condition problems. If I was going to do it entirely in SQL I'd use a deleted code table. When users then get deleted you'd use some service to add their code to the deleted table. If the deleted code table is empty threads would use a sequence number to get their new code. Getting a code from the deleted would need to be in a synchronized block because of the get and then set nature with multiple SQL operations. I don't think SQL transactions are going to help there. They may keep the data consistent but if two threads use the same code then one of the two commits is going to throw an exception.
I think a better, and faster, mechanism would be to have a separate thread manage these deleted codes. It could write it in a database but also keep a BlockingQueue of deleted codes for the other threads to consume. If there must be no holes and you are worried about crashing then it will need to validate the list of available holes by querying the user table at launch. It would not need to synchronize or do any SQL transactions because only it would be deleting from the deleted code table.
Hope this helps.
I would lean toward putting the logic in a stored procedure. Use "select for update" to lock, then commit to unlock.
You can add a filter to your insert statement and retry logic on the client end, I guess:
determine an available code (proposed code)
perform the insert with a filter determine the number of rows from the executeUpdate result (0 means a concurrent thread grabbed this code, try again)
The insert would look something along these lines where 3 is your new id, 'Joe' your new user, and proposedCode the one you think is available:
INSERT INTO users
SELECT 3, :proposedCode, 'Joe'
FROM dual
WHERE :proposedCode NOT IN (SELECT code FROM users)
How about:
insert into usertable (
id,
code,
name
) values (
user_id_sequence.nextval,
(
select min(newcode)
from usertable, (
select level newcode
from dual
connect by level <= (select max(code)+1 from usertable))
where not exists (select 1 from usertable where code = newcode)
),
'mynewusername'
)
EDIT:
changed to max(code) + 1, so if there is no gap available, there is a new code available.
My question is a little vague at the moment since I'm not sure that I'm supposed to post any company code online or anything. But here goes.
Suppose I need to update a specific field in a MySQL database. In order to do this using my Java client program, I have to use multiple SELECT statements in order to check that the field should be updated, and then appropriately update it using the information that has been retrieved.
eg.
//created a Connection called con already...
PreparedStatement selectStatement = con.prepareStatement("SELECT * FROM myTable" /*+ etc*/); //example query only! I'm not actually going to use "SELECT * FROM myTable"!
//more selectStatements follow
PreparedStatement updateStatement = con.prepareStatement("UPDATE myTable SET field1 = ? WHERE id = ?");
ResultSet rs = selectStatement.executeQuery();
//more ResultSets from the other selectStatements
//process ResultSets and retrieve information that indicates wwhether an update must take place
if(conditionOccurred) { //Assuming we need to update
updateStatement.setText(...);
updateStatement.executeUpdate();
}
(I haven't included try-catches in the code (sorry, I'm a bit lazy since this is just a contrived example) but I'd have to catch the potential SQLExceptions as well, I guess...)
What I'm wondering is: will it still be more "expensive", or costly in terms of speed if I delete the row and then insert a new row that contains all the updated information, given that I now need to use multiple select statements to check whether an update should occur? (memory is not such a big issue at the moment, though if something I've done has a massive flaw with regards to this I'd love to hear it!)
TL; DR: If I use multiple SELECT statements and then an UPDATE to some field(s), will it be more efficient to simply DELETE and then INSERT a new row?
Extra details: the table I'm working with at the moment has an auto-incremented ID, a VARCHAR field (the one to be updated, has a uniqueness constraint), 2 date fields and a CHAR(64) field. Not sure if it helps in answering the question, but I'll provide it anyway.
Please let me know if there are more details you'd need, and thank you in advance to anyone who might provide some insight.
To fully answer your question we would need to see your SELECT statement, however if your UPDATE does not alter the primary key values I would assume UPDATE is more efficient. The reasoning behind this is that an index values would not have to be adjusted where in the case of the DELETE & INSERT the index would be.
As in most cases the only sure fire way to test this is by using both methods and bench marking the elapsed time.
I'm answering your question based on the knowledge I acquired from my advanced database management course. I would say it would be very subjective as your concern is in terms of speed here and not the usage of memory.
When retrieval are done, in terms of your Select statements, your data are cached and when any necessary Update are required, you directly edit the fields in the cache. This save a read and write trip if you were performing the latter of Delete and Insert.
This would in my understanding, save you processing time in terms of millisecond for one single transaction, and if you look at a big picture, it will save you a lot when multiple transaction are performed. However, if your select statements involves too many queries dealing with a large size of data, it might turn out that your latter method is more efficient.
I believe with your additional inputs of your SQL statements, we would be able to give you a better and more accurate advise. :) I hope it helps.
I have a customer with a very small set of data and records that I'd normally just serialize to a data file and be done but they want to run extra reports and have expandability down the road to do things their own way. The MySQL database came up and so I'm adapting their Java POS (point of sale) system to work with it.
I've done this before and here was my approach in a nutshell for one of the tables, say Customers:
I setup a loop to store the primary key into an arraylist then setup a form to go from one record to the next running SQL queries based on the PK. The query would pull down the fname, lname, address, etc. and fill in the fields on the screen.
I thought it might be a little clunky running a SQL query each time they click Next. So I'm looking for another approach to this problem. Any help is appreciated! I don't need exact code or anything, just some concepts will do fine
Thanks!
I would say the solution you suggest yourself is not very good not only because you run SQL query every time a button is pressed, but also because you are iterating over primary keys, which probably are not sorted in any meaningful order...
What you want is to retrieve a certain number of records which are sorted sensibly (by first/last name or something) and keep them as a kind of cache in your ArrayList or something similar... This can be done quite easily with SQL. When the user starts iterating over the results by pressing "Next", you can in the background start loading more records.
The key to keep usability is to load some records before the user actually request them to keep latency small, but keeping in mind that you also don't want to load the whole database at once....
Take a look at indexing your database. http://www.informit.com/articles/article.aspx?p=377652
Use JPA with the built in Hibernate provider. If you are not familiar with one or both, then download NetBeans - it includes a very easy to follow tutorial you can use to get up to speed. Managing lists of objects is trivial with the new JPA and you won't find yourself reinventing the wheel.
the key concept here is pagination.
Let's say you set your page size to 10. This means you select 10 records from the database, in a certain order, so your query should have an order by clause and a limit clause at the end. You use this resultset to display the form while the users navigates with Previous/Next buttons.
When the user navigates out of the page then you fetch an other page.
https://www.google.com/search?q=java+sql+pagination