I made wrote a little piece of camel to consume a ftp server.
But after it was running for some time, it throws an exception, keeps running but doesn't consume anything any more. Also when I start it again and there are a larger number of file waiting to be consumed it will crash again. I already added an exception handler but it seems like is doesn't catch the exceptions.
This is the exception I receive:
Caused by: [org.apache.camel.component.file.GenericFileOperationFailedException - File operation failed: 150 Opening ASCII mode data connection for 2386442.XML(3895 bytes).
Accept timed out. Code: 150]
org.apache.camel.component.file.GenericFileOperationFailedException: File operation failed: 150 Opening ASCII mode data connection for 2386442.XML(3895 bytes).
Accept timed out. Code: 150
at org.apache.camel.component.file.remote.FtpOperations.retrieveFileToStreamInBody(FtpOperations.java:336)
at org.apache.camel.component.file.remote.FtpOperations.retrieveFile(FtpOperations.java:297)
at org.apache.camel.component.file.GenericFileConsumer.processExchange(GenericFileConsumer.java:333)
at org.apache.camel.component.file.remote.RemoteFileConsumer.processExchange(RemoteFileConsumer.java:94)
at org.apache.camel.component.file.GenericFileConsumer.processBatch(GenericFileConsumer.java:175)
at org.apache.camel.component.file.GenericFileConsumer.poll(GenericFileConsumer.java:136)
at org.apache.camel.impl.ScheduledPollConsumer.doRun(ScheduledPollConsumer.java:140)
at org.apache.camel.impl.ScheduledPollConsumer.run(ScheduledPollConsumer.java:92)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketTimeoutException: Accept timed out
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408)
at java.net.ServerSocket.implAccept(ServerSocket.java:462)
at java.net.ServerSocket.accept(ServerSocket.java:430)
at org.apache.commons.net.ftp.FTPClient._openDataConnection_(FTPClient.java:560)
at org.apache.commons.net.ftp.FTPClient.retrieveFile(FTPClient.java:1442)
at org.apache.camel.component.file.remote.FtpOperations.retrieveFileToStreamInBody(FtpOperations.java:328)
... 16 more
Caused by: [org.apache.camel.component.file.GenericFileOperationFailedException - Cannot retrieve file: GenericFile[2386448.XML] from: Endpoint[ftp://1.1.1.1?delay=15000&delete=true&disconnect=true&exclude=((?i).*pdf$)&password=******&username=user]
org.apache.camel.component.file.GenericFileOperationFailedException: Cannot retrieve file: GenericFile[2386448.XML] from: Endpoint[ftp://1.1.1.1?delay=15000&delete=true&disconnect=true&exclude=((?i).*pdf$)&password=******&username=user]
at org.apache.camel.component.file.GenericFileConsumer.processExchange(GenericFileConsumer.java:338)
at org.apache.camel.component.file.remote.RemoteFileConsumer.processExchange(RemoteFileConsumer.java:94)
at org.apache.camel.component.file.GenericFileConsumer.processBatch(GenericFileConsumer.java:175)
at org.apache.camel.component.file.GenericFileConsumer.poll(GenericFileConsumer.java:136)
at org.apache.camel.impl.ScheduledPollConsumer.doRun(ScheduledPollConsumer.java:140)
at org.apache.camel.impl.ScheduledPollConsumer.run(ScheduledPollConsumer.java:92)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
And this is the route I made using the Java DSL:
// XML Predicate
// only allows names without spaces
Predicate xmlPredicate = header(RssUtils.CAMEL_FILE_NAME).regex(
"([\\S]+(\\.(?i)(xml))$)");
// Images Predicate
// only allows names without spaces
Predicate imgPredicate = header(RssUtils.CAMEL_FILE_NAME).regex(
"([\\S]+(\\.(?i)(jpg|png|gif))$)");
onException(SchemaValidationException.class).to(
"file://" + props.getProperty(RssUtils.ROOT_DIR)
+ "/errors/SchemaValidationException");
onException(GenericFileOperationFailedException.class).to(
"file://" + props.getProperty(RssUtils.ROOT_DIR)
+ "/errors/GenericFileExceptions");
from(
"ftp://"
+ props.getProperty(RssUtils.FTP_URL)
+ "?username="
+ props.getProperty(RssUtils.FTP_USER)
+ "&password="
+ props.getProperty(RssUtils.FTP_PWD)
+ "&disconnect=true&delete=true&exclude=((?i).*pdf$)&delay="
+ props.getProperty(RssUtils.FTP_DELAY))
.choice()
.when(xmlPredicate)
.to("jms:xmlQueue")
.to("jms:archiveQueue")
.when(imgPredicate)
.to("file://" + props.getProperty(RssUtils.ROOT_DIR) + "/img")
.otherwise()
.to("file://" + props.getProperty(RssUtils.ROOT_DIR)
+ "/errors/other");
from("jms:xmlQueue").to("validator:FtpXmlValidator.xsd")
.to("xslt://XmlToRssConverter.xsl")
.process(rssFeedProcessor)
.to("file://" + props.getProperty(RssUtils.ROOT_DIR) + "/rss/");
from("jms:archiveQueue")
.to("file://" + props.getProperty(RssUtils.ROOT_DIR) + "/archive/");
Is there anything I can do to avoid this kind af behavior? It is really difficult to test so I'm hoping somebody spots a flaw in my code. I have searching for quite some time now but I don't find anything solid. Maybe some way I could debug this issue?
There maybe a few things that I found somebody could give his tought on:
use handled(true) when using the onException
can I set the max batch size of the consumer? (can I use throttle for this ?)
use explicit try catch finally because I'm using the Java DSL
Don't shoot me if I'm saying anything wrong here, I just learning Camel.
So if anybody has suggestions on the code above I would appreciate it!
Thanks a lot in advance!
What you have here is an FTP issue, the fact that it is occurring in Apache Camel is largely irrelevant.
The telltale part of the bomb is:
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408)
at java.net.ServerSocket.implAccept(ServerSocket.java:462)
at java.net.ServerSocket.accept(ServerSocket.java:430)
at org.apache.commons.net.ftp.FTPClient.openDataConnection(FTPClient.java:560)
The openDataConnection method of org.apache.commons.net.ftp.FTPClient is there to support active mode FTP - passive mode just uses the same port as for commands, so it doesn't need a separate port connection.
Try switching to passive mode (passiveMode = true with Apache Camel).
On the surface, without looking at WHY your route is failing, it sounds like what you're looking to do is handle and continue--i.e., handle this exception and continue your route where you left off. Per the documentation:
Available as of Camel 2.3
In Camel 2.3 we introduced a new option continued which allows you to both handle and continue routing in the original route as if the exception did not occur.
For instance to just ignore and continue if the IDontCareException was thrown we can do this:
onException(IDontCareException).continued(true);
What happens here is:
Camel will catch the exception and . . . just ignore it and continue routing in the original route. However . . . it will route that [onException] route first, before it will continue routing in the original route.
Give this a try and it may solve your problem. As I implied, above, depending on what your root issue is, this may be more of a bandaid than a proper solution. The better approach might be to figure out why the FTP consumer is failing. At a glance, it appears that it can't find the file named 2386448.XML.
Once you determine the root cause, you can use a choice to behave differently at the right time as in:
.choice()
.when(isValidFtpResponse())
.to(DIRECT_CONTINUE_FTP_ROUTE)
.otherwise()
.setBody(constant(null))
.log(ERROR, "FTP failed: ${headers}")
.end()
Hopefully, that gives you a few ideas and helps you get passed this issue.
Related
I know there are a ton of these posts, but this is a little different. We are using vended code for part of our data processing system, and part of the system sends emails to clients if certain events take place on data insertion or deletion. Recently we have started getting address already in use exceptions. We checked the repository history, and nothing has changed in our code in the last 6 months for this system. We have already tried the typical solutions for this issue including increasing the number of connections allowed to the port with little success. We had a meeting with the vendor, and I asked if anything had changed in their code, and if they would assure that all connections in their code are explicitly closed. They indicated that they are explicitly closing all sockets. However they didn't show us the code so there is no way for us to know if this is true other than taking their word for it. So, the only thing I can think of to do is continue to increase the number of connections to the port until we stop getting bind exceptions. So, what is the industry standard for max number of connections to port 25; is there one? Also if anyone has any other suggestions I would greatly appreciate it? Thanks so much in advance, Robert
20210505112127.716 ERROR m.fiserv.ppx.business.notification.EmailNotifier : MessagingException from notify
javax.mail.MessagingException: Could not connect to SMTP host: SERVER.URL.COM, port: 25;
nested exception is:
java.net.BindException: Address already in use: connect
at com.sun.mail.smtp.SMTPTransport.openServer(SMTPTransport.java:1545)
at com.sun.mail.smtp.SMTPTransport.protocolConnect(SMTPTransport.java:453)
Caused by:
java.net.BindException: Address already in use: connect
at java.net.DualStackPlainSocketImpl.connect0(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:90)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:380)
20210505131529.950 ERROR erv.ppx.web.controller.AuditReportViewController : Error while generating HTML
net.sf.jasperreports.engine.JRException: Error writing to OutputStream writer : CorpAdminAuditReport
at net.sf.jasperreports.engine.export.JRHtmlExporter.exportReport(JRHtmlExporter.java:496)
at com.fiserv.ppx.web.controller.AuditReportViewController.generateReport(AuditReportViewController.java:184)
Caused by:
com.ibm.wsspi.webcontainer.ClosedConnectionException: OutputStream encountered error during write
at com.ibm.ws.webcontainer.channel.WCCByteBufferOutputStream.write(WCCByteBufferOutputStream.java:188)
at com.ibm.ws.webcontainer.srt.SRTOutputStream.write(SRTOutputStream.java:97)
20210505140706.240 ERROR com.fiserv.ppx.business.db.DBConnectionUtil : Exception in getting for AppServer connection from DataSource.
com.ibm.websphere.ce.cm.ConnectionWaitTimeoutException: J2CA1010E: Connection not available; timed out waiting for 180,005 seconds.
at com.ibm.ws.rsadapter.AdapterUtil.toSQLException(AdapterUtil.java:1680)
at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDataSource.java:661)
at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDataSource.java:611)
Caused by:
com.ibm.websphere.ce.j2c.ConnectionWaitTimeoutException: J2CA1010E: Connection not available; timed out waiting for 180,005 seconds.
at com.ibm.ejs.j2c.FreePool.createOrWaitForConnection(FreePool.java:1781)
at com.ibm.ejs.j2c.PoolManager.reserve(PoolManager.java:3834)
at com.ibm.ejs.j2c.PoolManager.reserve(PoolManager.java:3082)
20210505140731.341 ERROR com.fiserv.ppx.business.db.DBConnectionUtil : Exception in getting for AppServer connection from DataSource.
com.ibm.websphere.ce.cm.ConnectionWaitTimeoutException: J2CA1010E: Connection not available; timed out waiting for 180,010 seconds.
at com.ibm.ws.rsadapter.AdapterUtil.toSQLException(AdapterUtil.java:1680)
at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDataSource.java:661)
at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDataSource.java:611)
Caused by:
com.ibm.websphere.ce.j2c.ConnectionWaitTimeoutException: J2CA1010E: Connection not available; timed out waiting for 180,010 seconds.
at com.ibm.ejs.j2c.FreePool.createOrWaitForConnection(FreePool.java:1781)
at com.ibm.ejs.j2c.PoolManager.reserve(PoolManager.java:3904)
at com.ibm.ejs.j2c.PoolManager.reserve(PoolManager.java:3082)
20210505140731.341 ERROR com.fiserv.ppx.sso.controller.SSOController : SSO Configuration error
java.lang.NullPointerException
at com.fiserv.ppx.business.db.PPXDbTransactionManager.<init>(PPXDbTransactionManager.java:60)
at com.fiserv.ppx.sso.impl.SSOLoginAuthenticator.authenticateSSOUser(SSOLoginAuthenticator.java:157)
I'm trying to use Reactor Netty TcpClient in reactive way to interact with hosts, that may be unreachable. Here is an example of a channel initialization logic:
ConnectionProvider connectionProvider = ConnectionProvider.fixed("fixed", 50);
TcpClient.create(connectionProvider)
.host(host).port(port)
.wiretap(true)
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 50)
.doOnConnect(x -> log.trace("Connect to {}:{}", host, port))
.doOnConnected(conn -> log.trace("Connected {}", conn.channel()))
.connect()
.subscribe(this::utilizeConnection);
the output, that i receiving :
2019-09-04 08:23:13.612 TRACE 71988 --- [ioEventLoop-4-3] c.c.pcb.poc.network.tcp.NettyTcpSender : Connect to 192.168.88.210:2000
2019-09-04 08:23:13.684 WARN 71988 --- [actor-tcp-nio-4] io.netty.util.concurrent.DefaultPromise : An exception was thrown by reactor.netty.resources.PooledConnectionProvider$DisposableAcquire.operationComplete()
reactor.core.Exceptions$ErrorCallbackNotImplemented: io.netty.channel.ConnectTimeoutException: connection timed out: /192.168.88.210:2000
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: /192.168.88.210:2000
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:267) ~[netty-transport-4.1.36.Final.jar:4.1.36.Final]
at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) ~[netty-common-4.1.36.Final.jar:4.1.36.Final]
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127) ~[netty-common-4.1.36.Final.jar:4.1.36.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [netty-common-4.1.36.Final.jar:4.1.36.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:405) [netty-common-4.1.36.Final.jar:4.1.36.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) [netty-common-4.1.36.Final.jar:4.1.36.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.36.Final.jar:4.1.36.Final]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Assembly trace from producer [reactor.core.publisher.MonoCreate] :
reactor.core.publisher.Mono.create(Mono.java:183)
reactor.netty.resources.PooledConnectionProvider.acquire(PooledConnectionProvider.java:130)
Error has been observed by the following operator(s):
|_ Mono.create ⇢ reactor.netty.resources.PooledConnectionProvider.acquire(PooledConnectionProvider.java:130)
|_ Mono.doOnSubscribe ⇢ reactor.netty.tcp.TcpClientDoOn.connect(TcpClientDoOn.java:58)
The 'inbound' and 'outbound' are having a dedicated method to handle their errors, but they works on top of a Connection instance that won't be created if you got the 'connection timeout'.
I tried:
The exception, that i receiving is wrapped in 'ErrorCallbackNotImplemented'. But I wasn't able to find any way to implement any 'ErrorCallback'
The log contains a warning message from 'io.netty.util.concurrent.DefaultPromise' . but I wasn't able to find a way how to make own Promise to handle it in a right way.
No any configurations i've found that may somehow intercept connection timeouts.
workaround. The blocked approach to create a connection ( .block() instead of .subscribe()) will allow me to catch any Connection creating exceptions within plain try-catch block, but i'll lose the benefits of reactive approach with such workaround.
Do somebody may suggest me at least something to help me to find a right way to handle a 'io.netty.channel.ConnectTimeoutException'?
Do not forget to implement your error callback
Usually reactor.core.Exceptions$ErrorCallbackNotImplemented happens when there is subscription over labmda based .subscribe method (same for Mono and Flux).
If you are going to look at the sources here and here, you will find the place where reactor.core.Exceptions$ErrorCallbackNotImplemented is thrown!
Action Points
In order to handle the original io.netty.channel.ConnectTimeoutException I would recommend looking at Handling Errors section of the original Project Reactor documentation
I've got a problem with a Spring web application that periodically runs into an error fetching a connection from my connection pool. Eventually in the logs I see entries like:
Caused by: javax.persistence.PersistenceException: org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms.
Only way to recover I've found once it hits this point is to restart Tomcat.
I think the most likely explanation is I have some code somewhere that is not properly cleaning up its connection - returning it to Hikari, leaving something open so Spring can't clean it up, etc.
To troubleshoot I've set my hikari config leakDetectionThreshold to 5000ms and enabled logging. After that, I see log entries like
2018-04-24 19:53:56 WARN ProxyLeakTask:87 - Connection leak detection
triggered for org.postgresql.jdbc.PgConnection#664ec666, stack trace
follows
java.lang.Exception: Apparent connection leak detected
at org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:122)
at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:35)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:99)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:129)
at org.hibernate.engine.jdbc.internal.StatementPreparerImpl.connection(StatementPreparerImpl.java:47)
at org.hibernate.engine.jdbc.internal.StatementPreparerImpl$5.doPrepare(StatementPreparerImpl.java:146)
at org.hibernate.engine.jdbc.internal.StatementPreparerImpl$StatementPreparationTemplate.prepareStatement(StatementPreparerImpl.java:172)
at org.hibernate.engine.jdbc.internal.StatementPreparerImpl.prepareQueryStatement(StatementPreparerImpl.java:148)
at org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:1940)
at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1909)
at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1887)
at org.hibernate.loader.Loader.doQuery(Loader.java:932)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:349)
at org.hibernate.loader.Loader.doList(Loader.java:2615)
at org.hibernate.loader.Loader.doList(Loader.java:2598)
at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2430)
at org.hibernate.loader.Loader.list(Loader.java:2425)
at org.hibernate.loader.custom.CustomLoader.list(CustomLoader.java:335)
at org.hibernate.internal.SessionImpl.listCustomQuery(SessionImpl.java:2129)
at org.hibernate.internal.AbstractSharedSessionContract.list(AbstractSharedSessionContract.java:981)
at org.hibernate.query.internal.NativeQueryImpl.doList(NativeQueryImpl.java:147)
at org.hibernate.query.internal.AbstractProducedQuery.list(AbstractProducedQuery.java:1398)
at org.hibernate.query.internal.AbstractProducedQuery.getSingleResult(AbstractProducedQuery.java:1444)
at sun.reflect.GeneratedMethodAccessor191.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.orm.jpa.SharedEntityManagerCreator$DeferredQueryInvocationHandler.invoke(SharedEntityManagerCreator.java:379)
at com.sun.proxy.$Proxy163.getSingleResult(Unknown Source)
at com.mycompany.web.jpa.util.DBHelper.getPagedMappedDbResults(DBHelper.java:76)
at com.mycompany.web.jpa.repository.TaskRepositoryImpl.findTaskDetailsByStepIdAndIdIn(TaskRepositoryImpl.java:245)
......
So it is detecting a possible leak. Could be a false positive I suppose? But this is also the only class in my app that is doing database access outside of the standard service/repository pattern often used in Spring apps, so it seems like a likely culprit, and it's my best lead at the moment.
Anyway, the last piece of non library code I see in the trace (ie stuff I wrote, so most likely to be the cause of the leak!) is my DBHelper::getPagedMappedDbResults method, relevant bit included here:
Query q = entityManager.createNativeQuery(countQueryText);
setQueryParameters(q, parameters);
long numActualResults = 0;
try {
numActualResults = ((Number)q.getSingleResult()).longValue(); // line 76
} catch (Exception e) {
System.out.println("just in case: " + e);
}
So basically I create a Query object from my EntityManager instance, set some parameters, and run it to get some results.
Is there something I need to be doing with a Query object when I'm done with it? q.cleanup()? I don't see anything like this from reading the docs, but am I not doing good housekeeping on this resource?
The entityManager itself is created from an #Autowired annotation. My understanding is if I didn't "new" it to instantiate it and instead let the Spring framework autowire it, then Spring will do whatever cleanup is necessary. Is that right? Or do I need to be doing some cleanup after I use the entityManager?
Version details:
Tomcat 8 / Java 8
Spring 5.0.0.RELEASE
Spring Data Kay-RELEASE
Hibernate 5.2.3.Final
Hikari 2.4.5
Any advice or suggestions would be greatly appreciated, thanks!
What is the query? Is it heavy? Maybe you have deadlock here? Connection management looks fine. You do not acquire connection explicitly, so no need to release it. The query might be long running so Hibernate is not able to complete it and release the connection.
Also, you can check the number of open connections on the DB side. Do some analysis on that side as well.
We are using Jersey Server-Sent Events (SSE) to allow remote components of our application to listen to events raised by our Jersey/Tomcat server. This works great.
However, it is crucial that our server have an accurate list of currently-connected listeners (our remote components). To this end, our server sends a tiny message to each caller (via eventOutput.write) once every five seconds. If our remote component is shut down while SSE-connected, or if the remote computer is powered off while SSE-connected, our server's eventOutput.write throws the ClientAbortException/SocketException exception shown below. That's perfect: we catch the exception, mark that caller as no longer connected, and move on.
Now, for the problem. As I mentioned, eventOutput.write throws an exception in cases where our remote component software is not running, or where the computer it runs on has been powered down. However, there are two cases where calling eventOutput.write to a no-longer-connected computer does NOT throw an exception: 1) if the Ethernet cable of the remote computer is simply pulled while the caller is SSE-connected, and 2) if the network adapter in the remote computer is turned off (i.e., by an administrative action) while the caller is SSE-connected. In these two cases, we can call eventOutput.write to the remote computer every five seconds for hours and no exception is thrown. This makes it impossible to detect that the remote computer is no longer connected.
I see that EventOutput (and ChunkedOutput) has very few methods and properties, but I wonder if there is any way to configure or use it that will cause an exception to be thrown when writing to a remote computer that has been disconnected by having its Ethernet cable pulled or network adapter turned off.
And here is the (good/useful) exception we get in cases where eventOutput.write DOES throw the exception we want:
org.apache.catalina.connector.ClientAbortException: null
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:371) ~[catalina.jar:7.0.53]
at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:333) ~[catalina.jar:7.0.53]
at org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:101) ~[catalina.jar:7.0.53]
at org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.flush(ResponseWriter.java:303) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.message.internal.CommittingOutputStream.flush(CommittingOutputStream.java:292) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.server.ChunkedOutput$1.call(ChunkedOutput.java:240) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.server.ChunkedOutput$1.call(ChunkedOutput.java:190) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.internal.Errors.process(Errors.java:315) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.internal.Errors.process(Errors.java:242) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:347) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.server.ChunkedOutput.flushQueue(ChunkedOutput.java:190) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.server.ChunkedOutput.write(ChunkedOutput.java:180) ~[jaxrs-ri-2.13.jar:2.13.]
at com.appserver.webservice.AgentSsePollingManager$ConnectionChecker.run(AgentSsePollingManager.java:174) ~[AgentSsePollingManager$ConnectionChecker.class:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_71]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) [na:1.7.0_71]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) [na:1.7.0_71]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.7.0_71]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.7.0_71]
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) ~[na:1.7.0_71]
at java.net.SocketOutputStream.write(SocketOutputStream.java:159) ~[na:1.7.0_71]
at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:215) ~[tomcat-coyote.jar:7.0.53]
at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480) ~[tomcat-coyote.jar:7.0.53]
at org.apache.coyote.http11.InternalOutputBuffer.flush(InternalOutputBuffer.java:119) ~[tomcat-coyote.jar:7.0.53]
at org.apache.coyote.http11.AbstractHttp11Processor.action(AbstractHttp11Processor.java:799) ~[tomcat-coyote.jar:7.0.53]
at org.apache.coyote.Response.action(Response.java:174) ~[tomcat-coyote.jar:7.0.53]
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:366) ~[catalina.jar:7.0.53]
... 19 common frames omitted
I do not think it will be possible to account for all the possible failures by adding the code around SSE sockets even if Jersey adds all information accessible from the socket interface. The only viable solution is proper two way communication. In case of SSE chunked output stream, pulled cable does not cause any interruption because nothing is supposed to tell it that remote host is now unreachable (until OS closes the socket).
Your first step is right - implement heartbeats every N seconds. Then all you need to do is to report back with another tiny http call every so often that you still listen. It is up to you to do acknowledgments every 5 seconds or every minute - depends on how fast do you need a problem detection.
You can do it in the same Jersey Resource by implementing #POST (in RESTful terms it reads "create new ack that you receive events").
Note: browsers are good at re-establishing SSE connection on their own in case of network interrupts, no need to fiddle with it.
We are porting a simple Java application between Tandem NonStop systems, from G-Series to H-Series. Java version is 1.5.0_02.
When performing basic I/O tasks like getting output stream from or opening a client socket, we receive exceptions like
java.io.IOException: Value out of range
or
java.net.SocketException: Value out of range
("value out of range" is Tandem native jargon for, well, quite everything I suppose).
Has anybody got similar issues? i.e. I/O corruption while for example messing with JNI?
I suppose there is something wrong with the system, but where might it be?
Thank you.
EDIT:
adding snippets as requested
sample snippet (a) - using Runtime.exec () (adapted)
Properties envVars = new Properties();
Process p = r.exec("/bin/env");
envVars.load(p.getInputStream());
Stack trace (a):
java.io.IOException: Value out of range (errno:4034)
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:194)
at java.lang.UNIXProcess$DeferredCloseInputStream.read(UNIXProcess.java:221)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:254)
at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411)
at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at java.io.BufferedReader.fill(BufferedReader.java:136)
at java.io.BufferedReader.readLine(BufferedReader.java:299)
at java.io.BufferedReader.readLine(BufferedReader.java:362)
at util.Environment.getVariables(Environment.java:39)
Last line fails, and output gets redirected to console (!).
sample snippet (b) - using HttpURLConnection:
public WorkerThread (HttpURLConnection conn, String requestData, Logger logger)
{
this.conn = conn;
...
}
public void run ()
{
OutputStream out = conn.getOutputStream ();
}
Stack trace (b):
java.net.SocketException: Value out of range (errno:4034)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.Socket.connect(Socket.java:507)
at sun.net.NetworkClient.doConnect(NetworkClient.java:155)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:365)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:477)
at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:280)
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:337)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:176)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:736)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:162)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:828)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:230)
Case (a) can be avoided because it was a workaround for other issues with previous JRE version (!), but same behaviour with sockets is really nasty.
Error code 4034 seem to indicate that a specific server is not running in your NonStop cluster. Are you sure that your system is setup properly?
Update: The problem was caused by a spurious .so library.