Good time.
Some days ago our DB team has detected that there are sessions with an 'ACTIVE' status for clients that are not active any more. An investigation showed that there are two main sources for such issues:
remote SQL Developer connections (actually, this case is not very interesting),
abnormal tomcat (where our application was running) shutdowns (like 'kill -9')
It is strange for me that all that sessions are in the 'ACTIVE' status. Could somebody, please, clarify how could this be (maybe there is something with underlying processes that wait on the corresponding sockets or ... As far as everything is ok upon a proper tomcat shutdown, it seems like a root cause is with transactions...)?
Would it help if we set the 'IDLE_TIME' (for all connections) and 'EXPIRE_TIME' (for all our RAC instances)?
Am I right that the following scenarios should take place (with the above parameters set up):
When a client connects it's session is marked as 'ACTIVE'
Without respect to the 'ACTIVE' status, there is a 'ping' process, originated by the 'EXPIRE_TIME' parameter that pings a client.
If a ping process fails for the EXPIRE_TIME time period, even though a session is 'ACTIVE', the session is being killed by oracle.
If a client responds on pings, but does not do any processing, after an IDLE_TIME time period it's session becomes 'INACTIVE' and (if the 'IDLE_TIME' parameter is set) after some time - 'SNIPED'. After that a 'SMON' process starts a housekeeping activity for this session (and others with the 'SNIPED' status).
UPDATE:
It seems that the only way to work with such a situation is to configure Oracle instances.
There are results of my investigation:
https://community.oracle.com/thread/873226?start=0&tstart=0
For Dead Connection Detection is used server side sqlnet.ora file
parameter SQLNET.EXPIRE_TIME= <# of minutes>
The other option would be implement idle_time in profile setting. And then with some job kill SNIPED sessions (when idle_time will be
reached, session will become from INACTIVE to SNIPED).
If I open up a connection and go off to lunch, an IDLE_TIME limit will
cause my session to be terminated after 15 minutes of inactivity. An
EXPIRE_TIME of 15 minutes will merely have Oracle send a packet to my
client application to verify that the client has not failed. If the
client is up, it will answer the ping, and my session will stay around
indefinitely. EXPIRE_TIME only causes sessions to be killed if the
client application fails to respond to the ping, implying that the
client process has failed or that the client system has failed.
IDLE_TIME will kill sessions that don't have activity for a period of
time, but that generally doesn't work well with applications that
maintain a connection pool, since the assumption there is that the
connection pool will have a fair number of connections that are idle
for periods of the day and since applications that use connection
pools tend to react poorly to connections in the pool getting killed.
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2233109200346833212
TCP/IP doesn't interrupt things by default, by design. When a
connection goes away, the client and/or server do not immediately get
notified. So, if your client connects to the database and does nothing
and you unplug your client (blue screen it, kill it, pull out the network cable, crash the computer, whatever) the odds are the session
will stay in the database. The server will not know that it will never
receive a message from you. We have dead client detection for that if
it becomes an issue:
http://download.oracle.com/docs/cd/B12037_01/network.101/b10776/sqlnet.htm#sthref476
As for the active session, that is really easy. You open a connection,
you submit a request over this connection like "lock table T". Table t
is locked by your transaction in your session. You then submit a block
of code like:
begin loop dbms_lock.sleep(5); end loop; end; /
your session will be active for as long as that code is running - the
client process is blocked on a socket read waiting for the server to
send back a result - a response (which of course will never come). The
server isn't touching the network at all right now - it is active with
your code. So, if your client 'dies' right now - the block of code
will continue to run, and run, and run - and since it never commits -
it'll just hang in there and run and of course any locks you have will
remain in place.
http://www.databaseskill.com/4267817/
http://www.dba-oracle.com/t_connect_time_idle_expire_timeout.htm
The sqlnet.expire_time parameter is used to set a time interval, in
minutes, to determine how often a probe should be sent verifying that
client/server connections are active. If you need to ensure that
connections are not left open indefinitely (or up to the time set by
operating system-specific parameters), you should set a value that is
greater than 0. This protects the system from connections left open
due to an abnormal client termination.
https://asktom.oracle.com/pls/apex/f?p=100:11:0::NO::P11_QUESTION_ID:453256655431
If the session is waiting for a resource locks or latches, and if this
wait time exceeds idle_time setting in th profile, does the session
gets sniped, even if the session is in the middle of a transaction and
waiting for lock etc.
If so, will there be any entries in the alert log.
Followup
if waiting for a lock, you are active -- not idle.
These page get my attention, six month before I had an Oracle Support
for a Data Guard issue, so one of the Oracle guys, notice that I use
the Idle_Time and he told me that this parameter dont work very well
because Oracle dont release the resource of sessions that were marked
as snipped, until the next time the user try to use it (waiting to
tell your session was killed, to clear the session resources)
Followup
... after an investigation... the "session" is there, that
will not go away until the client acknowledges it, but the
"transaction" is gone.
Tom, I've altered a profile to have IDLE_TIME=240(4 hours) and made
sure my resource_limit parameter is set to TRUE. When I query
v$session I see some "snipped" sessions, but also "inactive" ones that
have been idle for more than a day. All those users have this profile
assigned to them. If the user session was connected before idle_time
was set, would those sessions be affected by this change or not? I've
made a change quite some time ago. Is there anything else I should
have done?
Followup
If the user session was connected before idle_time was set,
they are "grandfathered" in -- they will not be sniped. it only
affects new sessions.
http://agstamy.blogspot.ru/2011/03/profile-and-resource-limit.html
additional stuff and recommendations: https://rebby.com/blog.php?detail=32
We have checked the parameters, listed in the above investigation and everything worked correctly!
Related
In our code (which runs as a schedule job via timer), we have threads running in parallel to perform a database operation. Problem here is each thread is initiating a connection via Hibernate factory. These connections are closed after every database action but stil gets stocked in the connection pool(as INACTIVE). All the connections gets released only after the job/main process is killed.Is there any way to release the connection even from connection pool after the database operation. When we use cron job instead of timer, the process gets killed automatically but we dont need cron here.
Kindly help us to resolve this as we are already nearing production release.
Note : We came to know about this when QA tested with heavy load on the job and for each load new connections are pulled.
You need to restrict the number of threads getting created in the thread pool.
dotConnect for Oracle uses connection pooling. The OracleConnection connection string has the Pooling parameter. If Pooling=true (the default value), the connection is not deleted after closing it, it is placed to the pool instead. When a new connection with the same connection string is opened, it is taken from the pool (if there are free connections) instead of the creating a new one. This provides significant performance improvements. If you use 800 connections that are connected for 10-15 seconds each, and there are only few different connection strings, you may not have 800 actual connections. Closed connections will be placed to the pool, and they will be taken from the pool when a new connection with the same connection string will open. No additional connection will open in such case.
You can disable Pooling by adding 'Pooling=false' to the connection string. In such case, a connection will be deleted from memory and free the session. However this may lead to performance loss.
Most likely, pooling should not cause creating too much sessions. Try testing your application with pooling on. If the session number will be too large, you can disable pooling.
For more information, please refer to http://www.devart.com/dotconnect/oracle/docs/FAQ.html#q54
I have found the root cause for the issue and have also found the solution.
The root cause was number of connections set as minimum and maximum and the time out parameter.
The minimum was 5 and max was 20 and timeout was 800 seconds. But out job was scheduled to run every minute. Due to the configuration, the connections were not released properly within minute.
Another issue was our code was not using the session factory as singleton, but was initializing for each thread. Since the resource was not shared, each session factory creates 5 connections by default and extended to 20 max. Since the timeout also was higher before the connections are released, next set of job starts and creates its own set of new connections.
Finally the pool gets full and oracle becomes unavailable.
We fixed this by sharing the session object across and also setting the timeout to lesser value so that connections are getting released from pool.
We are using Oracle 11G and JDK1.8 combination.
In our application we are using XAConnection, XAResource for DB transaction.
ie) distributed transactions.
On few occasions we need to kill our Java process to stop the application.
After killing, if we restart our application then we are getting the below exception while doing DB transaction.
java.sql.SQLException: ORA-02049: timeout: distributed transaction
waiting for lock
After this for few hours we are unable to use our application till the lock releases.
Can someone provide me some solution so that we can continue working instead of waiting for the lock to release.
I have tried the below option:
a) Fetched the SID and killed the session using alter command.After this also table lock is not released.
I am dealing with very small amount of data.
I followed one topic similar with that with tips about what to do with distributed connections.
Oracle connections remains open until you end your local session or until the number of database links for your session exceeds the value of OPEN_LINKS. To reduce the network overhead associated with keeping a database link open, then use this clause to close the link explicitly if you do not plan to use it again in your session.
I believe that, by closing your connections and sessions after DDL execution, this issue should not happens.
Other possibility is given on this question:
One possible way might be to increase the INIT.ORA parameter for distributed_lock_timeout to a larger value. This would then give you a longer time to observe the v$lock table as the locks would last for longer.
To achieve automation of this, you can either
- Run an SQL job every 5-10 seconds that logs the values of v$lock or the query that sandos has given above into a table and then analyze it to see which session was causing the lock.
- Run a STATSPACK or an AWR Report. The sessions that got locked should show up with high elapsed time and hence can be identified.
v$session has 3 more columns blocking_instance, blocking_session, blocking_session_statusthat can be added to the query above to give a picture of what is getting locked.
I hope I helped you, my friend.
I am having an Spring MVC + Mysql (JDBC 4) + c3p0 0.9.2 project.
In c3p0 maxIdleTime value is 240 (i.e 4 mins.) and wait_timeout in my.ini of Mysql to 30 seconds.
According to c3p0
maxIdleTime:
(Default: 0)
Seconds a Connection can remain pooled but unused before being discarded. Zero means idle connections never expire.
According to Mysql
wait_timeout: The number of seconds the server waits for activity on a
noninteractive connection before closing it.
Now i am having some douts on this:(some answers are known to me,Just wated to be sure I am correct or not)
unused connection means the connection which are in sleep state according to mysql(?)
What is interactive and noninteractive connections?
Is unused connections and noninteractive coonections are same? because my DBA set wait_timeout to 30 seconds (he come to this value by observing DB server so that very less amount of connections be in sleep mode) this means an connection can be in sleep mode for 30 seconds after that it will be closed but at the otherhand c3p0's maxIdleTime is set to 240 seconds so whats this maxIdleTime setting playing role in this case.
What is interactive_timeout?
First Let's understand the mysql properties.
interactive_timeout : interactive time out for mysql shell sessions
in seconds like mysqldump or mysql command line tools. connections are in sleep state. Most of the time this is set to higher value because you don't want it to get disconnected while you are doing something on mysql cli.
wait_timeout
: the amount of seconds during inactivity that MySQL will wait before
it will close a connection on a non-interactive connection in
seconds. example: connected from java. connections are in sleep state.
Now let's understand c3po properties and it's relation with DB props.(I am just gonna copy from your question)
maxIdleTime: (Default: 0) Seconds a Connection can remain pooled but unused before being discarded. Zero means idle connections never
expire.
This refers to how long a connection object can be usable and will be available in pool. Once the timeout is over c3po will destroy it or recycle it.
Now the problem comes when you have maxIdleTime higher then the wait_timeout.
let's say if the mxIdleTime : 50 secs and wait_timeout : 40 s then there is a chanse that you will get Connection time out exception: Broken Pipe if you try to do any operation in last 10 seconds. So maxIdelTime should always be less then wait_timeout.
Instead of maxIdleTime you can you the following properties.
idleConnectionTestPeriod sets a limit to how long a connection will
stay idle before testing it. Without preferredTestQuery, the default
is DatabaseMetaData.getTables() - which is database agnostic, and
although a relatively expensive call, is probably fine for a
relatively small database. If you're paranoid about performance use a
query specific to your database (i.e. preferredTestQuery="SELECT 1")
maxIdleTimeExcessConnections will bring back the connectionCount back
down to minPoolSize after a spike in activity.
Please note that any of the pool property(eg. maxIdleTime) only affects to connection which are in pool i.e if hibernate has acquired a connection and keeps it idle for than maxIdleTime and then tries to do any operation then you will get "Broken Pipe"
It is good to have lower wait_timeout on mysql but It's not always right when you have an application already built.
You have to make sure before reducing it that in your application you are not keeping connection open for more that wait_time out.
You also have to consider that acquiring a connection is expensive task and if have wait time out too low then it beats the whole purpose of having connection pool, as it will frequently try to acquire connections.
This is especially important when you are not doing connection management manually for example when you use Spring transnational API. Spring starts transaction when you enter an #Transaction annotated method so it acquires a connection from pool. If you are making any web service call or reading some file which will take more time than wait_time out then you will get exception.
I have faced this issue once.
In one of my projects I had a cron which would do order processing for customers. To make it faster I used batch processing. Now once I retrieve a batch of customers and do some processing(no db calls). When I try to save all the orders I used to get broken pipe exception. The problem was my wait_timeout was 1 minute and order processing was taking more time then that. So We had to increase it to 2 minutes. I could have reduced the batch size but that was making the overall processing slower.
unused connection means the connection which are in sleep state according to mysql(?)
According to mysql, this simply means that a connection was established with mysql/db, but there has been no activity here for the past amount of time and due to configuration / settings of mysql(which can be changed), the connection was destroyed.
What is interactive and noninteractive connections?
Interactive connections are when your input hardware(keyboard) interacts using command line with mysql. In short where you write the queries
Non interactive or rather wait_timeout queries are those for which your code establishes connection with mysql.
Is unused connections and noninteractive coonections are same? because my DBA set wait_timeout to 30 seconds (he come to this value by observing DB server so that very less amount of connections be in sleep mode) this means an connection can be in sleep mode for 30 seconds after that it will be closed but at the otherhand c3p0's maxIdleTime is set to 240 seconds so whats this maxIdleTime setting playing role in this case.
MaxIdleTime is done by your code at hibernateJpa Configuration where you ask your code itself to close a hibernate connection(for example) after a connection is unused. You have ownership of this as a coder.
Wait_timeout on other hand is from mysql side. So it is upon the DB administrator to set it up and change.
What is interactive_timeout?
Again, interactive timeout is when you are writing queries after connecting to mysql from keyboard on command line and that time conf in mysql gets up.
If you want to know more about how to change these values, go through this link:
http://www.serveridol.com/2012/04/13/mysql-interactive_timeout-vs-wait_timeout/
Hope now it is clear to you.:)
I am considering creating a HBase table when my application starts up and leaving it open as long as my application is running. My application may run indefinitely.
What happens if I never close the HBase table?
Is there a maximum time the connection can be open/idle before it need to be reinitialized?
How is the connection closed if the system crashed?
I have HBase The Definitive Guide but I have not found the information I am looking for in there. If there are any online references for this then please provide them.
This was extracted from "HBase in Action" page 25:
"Closing the table when you’re finished with it allows the underlying
connection resources to be returned to the pool."
This blog post is about timeouts in HBase. Generally speaking, there is a lot of them:
ZK session timeout (zookeeper.session.timeout)
RPC timeout (hbase.rpc.timeout)
RecoverableZookeeper retry count and retry wait (zookeeper.recovery.retry, zookeeper.recovery.retry.intervalmill)
Client retry count and wait (hbase.client.retries.number, hbase.client.pause)
You may try to raise them a bit and set a really high value for retry count. This can make your sessions be alive for a very long period of time.
When the system of HBase client crashed, the connection is closed by timeout.
We are facing an unusual problem in our application, in the last one month our application reached an unrecoverable state, It was recovered post application restart.
Background : Our application makes a DB query to fetch some information and this Database is hosted on a separate node.
Problematic case : When the thread dump was analyzed we see all the threads are in runnable state fetching the data from the database, but it didn't finished even after 20 minutes.
Post the application restart as expected all threads recovered. And the CPU usage was also normal.
Below is the thread dump
ThreadPool:2:47" prio=3 tid=0x0000000007334000 nid=0x5f runnable
[0xfffffd7fe9f54000] java.lang.Thread.State: RUNNABLE at
oracle.jdbc.driver.T2CStatement.t2cParseExecuteDescribe(Native Method)
at
oracle.jdbc.driver.T2CPreparedStatement.executeForDescribe(T2CPreparedStatement.java:518)
at
oracle.jdbc.driver.T2CPreparedStatement.executeForRows(T2CPreparedStatement.java:764)
at ora
All threads in the same state.
Questions:
what could be the reason for this state?
how to recover under this case ?
It's probably waiting for network data from the database server. Java threads waiting (blocked) on I/O are described by the JVM as being in the state RUNNABLE even though from the program's point of view they're blocked.
As others mentioned already, that native methods are always in runnable, as the JVM doesn't know/care about them.
The Oracle drivers on the client side have no socket timeout by default. This means if you have network issues, the client's low level socket may "stuck" there for ever, resulting in a maxxed out connection pool. You could also check the network trafic towards the Oracle server to see if it even transmits data or not.
When using the thin client, you can set oracle.jdbc.ReadTimeout, but I don't know how to do that for the thick (oci) client you use, I'm not familiar with it.
What to do? Research how can you specify read timeout for the thick ojdbc driver, and watch for exceptions related to the connection timeout, that will clearly signal network issues. If you can change the source, you can wrap the calls and retry the session when you catch timeout-related SQLExceptions.
To quickly address the issue, terminate the connection on the Oracle server manually.
Worth checking the session contention, maybe a query blocks these sessions. If you find one, you'll see which database object causes the problem.
Does your code manually handle transaction? If then, maybe some of the code didn't commit() after changing data. Or maybe someone ran data modification query directly through PLSQL or something and didn't commit, and that leads all reading operation to be hung.
When you experienced that "hung" and DB has recovered from the status, did you check the data if some of them were rolled back? Asking this since you said "It was recovered post application restart.". It's happening when JDBC driver changed stuff but didn't commit, and timeout happened... DB operation will be rolled back. ( can be different based on the configuration though )
Native methods remain always in RUNNABLE state (ok, unless you change the state from the native method, itself, but this doesn't count).
The method can be blocked on IO, any other event waiting or just long cpu intense task... or endless loop.
You can make your own pick.
how to recover under this case ?
drop the connection from oracle.
Is the system or JVM getting hanged?
If configurable and if possible, reduce the number of threads/ parallel connections.
The thread simply waste CPU cycles when waiting for IO.
Yes your CPU is unfortunately kept busy by the threads who are awaiting a response from DB.