I'm trying to make my SOAP service more robust, and handle clients that disconnect while the method they called is still running. I'm using #WebService and javax.jws.WebService.
The problem is that the called code ignores the disconnecting client and continues running, and only fails after it tries to return. I'd like to reverse the processing done but don't know how to catch this exception as it's simply printed after the method returns.
javax.xml.ws.WebServiceException: javax.xml.stream.XMLStreamException: java.io.IOException: Broken pipe
at com.sun.xml.internal.ws.encoding.StreamSOAPCodec.encode(StreamSOAPCodec.java:101)
at com.sun.xml.internal.ws.encoding.SOAPBindingCodec.encode(SOAPBindingCodec.java:249)
at com.sun.xml.internal.ws.transport.http.HttpAdapter.encodePacket(HttpAdapter.java:328)
at com.sun.xml.internal.ws.transport.http.HttpAdapter.access$100(HttpAdapter.java:82)
at com.sun.xml.internal.ws.transport.http.HttpAdapter$HttpToolkit.handle(HttpAdapter.java:470)
at com.sun.xml.internal.ws.transport.http.HttpAdapter.handle(HttpAdapter.java:233)
at com.sun.xml.internal.ws.transport.http.server.WSHttpHandler.handleExchange(WSHttpHandler.java:95)
at com.sun.xml.internal.ws.transport.http.server.WSHttpHandler.handle(WSHttpHandler.java:80)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:65)
at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:65)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:68)
at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:557)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:65)
at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:529)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: javax.xml.stream.XMLStreamException: java.io.IOException: Broken pipe
at com.sun.xml.internal.stream.writers.XMLStreamWriterImpl.flush(XMLStreamWriterImpl.java:388)
at com.sun.xml.internal.ws.streaming.XMLStreamWriterUtil.getOutputStream(XMLStreamWriterUtil.java:86)
at com.sun.xml.internal.ws.message.jaxb.JAXBMessage.writePayloadTo(JAXBMessage.java:308)
at com.sun.xml.internal.ws.message.AbstractMessageImpl.writeTo(AbstractMessageImpl.java:131)
at com.sun.xml.internal.ws.encoding.StreamSOAPCodec.encode(StreamSOAPCodec.java:98)
... 16 more
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:69)
at sun.nio.ch.IOUtil.write(IOUtil.java:40)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:336)
at sun.net.httpserver.Request$WriteStream.write(Request.java:397)
at sun.net.httpserver.ChunkedOutputStream.writeChunk(ChunkedOutputStream.java:108)
at sun.net.httpserver.ChunkedOutputStream.flush(ChunkedOutputStream.java:138)
at sun.net.httpserver.PlaceholderOutputStream.flush(ExchangeImpl.java:395)
at java.io.FilterOutputStream.flush(FilterOutputStream.java:123)
at com.sun.xml.internal.stream.writers.UTF8OutputStreamWriter.flush(UTF8OutputStreamWriter.java:140)
at com.sun.xml.internal.stream.writers.XMLStreamWriterImpl.flush(XMLStreamWriterImpl.java:386)
... 20 more
Related
I installed a new subversion repository but I am unable to connect to the repository.
In the old repository it is working as expected. The new repository thorws an exception:
org.eclipse.team.svn.core.connector.SVNConnectorException: svn: E210002: There was a problem while connecting to xn--x7h.example.com:22
at org.polarion.team.svn.connector.svnkit.SVNKitService.handleClientException(SVNKitService.java:59)
at org.polarion.team.svn.connector.svnkit.SVNKitConnector.listEntries(SVNKitConnector.java:1758)
at org.eclipse.team.svn.core.extension.factory.ThreadNameModifier.listEntries(ThreadNameModifier.java:324)
at org.eclipse.team.svn.core.utility.SVNUtility.list(SVNUtility.java:440)
at org.eclipse.team.svn.core.svnstorage.SVNRepositoryContainer.getChildren(SVNRepositoryContainer.java:79)
at org.eclipse.team.svn.core.operation.remote.GetRemoteFolderChildrenOperation.runImpl(GetRemoteFolderChildrenOperation.java:76)
at org.eclipse.team.svn.core.operation.AbstractActionOperation.run(AbstractActionOperation.java:82)
at org.eclipse.team.svn.core.utility.ProgressMonitorUtility.doTask(ProgressMonitorUtility.java:104)
at org.eclipse.team.svn.core.operation.CompositeOperation.runImpl(CompositeOperation.java:99)
at org.eclipse.team.svn.core.operation.AbstractActionOperation.run(AbstractActionOperation.java:82)
at org.eclipse.team.svn.core.operation.LoggedOperation.run(LoggedOperation.java:40)
at org.eclipse.team.svn.core.utility.ProgressMonitorUtility.doTask(ProgressMonitorUtility.java:104)
at org.eclipse.team.svn.core.utility.ProgressMonitorUtility.doTaskExternal(ProgressMonitorUtility.java:90)
at org.eclipse.team.svn.ui.utility.DefaultCancellableOperationWrapper.run(DefaultCancellableOperationWrapper.java:55)
at org.eclipse.team.svn.ui.utility.ScheduledOperationWrapper.run(ScheduledOperationWrapper.java:37)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)
Caused by: org.apache.subversion.javahl.ClientException: svn: E210002: There was a problem while connecting to xn--x7h.example.com:22
at org.apache.subversion.javahl.ClientException.fromException(ClientException.java:117)
at org.tmatesoft.svn.core.javahl17.SVNClientImpl.getClientException(SVNClientImpl.java:1539)
at org.tmatesoft.svn.core.javahl17.SVNClientImpl.list(SVNClientImpl.java:189)
at org.polarion.team.svn.connector.svnkit.SVNKitConnector.listEntries(SVNKitConnector.java:1745)
... 14 more
Caused by: org.tmatesoft.svn.core.SVNException: svn: E210002: There was a problem while connecting to xn--x7h.example.com:22
at org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:70)
at org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:57)
at org.tmatesoft.svn.core.internal.io.svn.SVNSSHConnector.open(SVNSSHConnector.java:145)
at org.tmatesoft.svn.core.internal.io.svn.SVNConnection.open(SVNConnection.java:77)
at org.tmatesoft.svn.core.internal.io.svn.SVNRepositoryImpl.openConnection(SVNRepositoryImpl.java:1273)
at org.tmatesoft.svn.core.internal.io.svn.SVNRepositoryImpl.getLatestRevision(SVNRepositoryImpl.java:172)
at org.tmatesoft.svn.core.internal.wc2.ng.SvnNgRepositoryAccess.getRevisionNumber(SvnNgRepositoryAccess.java:119)
at org.tmatesoft.svn.core.internal.wc2.SvnRepositoryAccess.getLocations(SvnRepositoryAccess.java:195)
at org.tmatesoft.svn.core.internal.wc2.ng.SvnNgRepositoryAccess.createRepositoryFor(SvnNgRepositoryAccess.java:46)
at org.tmatesoft.svn.core.internal.wc2.remote.SvnRemoteList.run(SvnRemoteList.java:36)
at org.tmatesoft.svn.core.internal.wc2.remote.SvnRemoteList.run(SvnRemoteList.java:1)
at org.tmatesoft.svn.core.internal.wc2.SvnOperationRunner.run(SvnOperationRunner.java:21)
at org.tmatesoft.svn.core.wc2.SvnOperationFactory.run(SvnOperationFactory.java:1235)
at org.tmatesoft.svn.core.wc2.SvnOperation.run(SvnOperation.java:294)
at org.tmatesoft.svn.core.javahl17.SVNClientImpl.list(SVNClientImpl.java:187)
... 15 more
Caused by: java.io.IOException: There was a problem while connecting to xn--x7h.example.com:22
at com.trilead.ssh2.Connection.connect(Connection.java:817)
at org.tmatesoft.svn.core.internal.io.svn.ssh.SshHost.openConnection(SshHost.java:225)
at org.tmatesoft.svn.core.internal.io.svn.ssh.SshHost.openSession(SshHost.java:153)
at org.tmatesoft.svn.core.internal.io.svn.ssh.SshSessionPool.openSession(SshSessionPool.java:85)
at org.tmatesoft.svn.core.internal.io.svn.SVNSSHConnector.open(SVNSSHConnector.java:122)
... 27 more
Caused by: java.io.IOException: Key exchange was not finished, connection is closed.
at com.trilead.ssh2.transport.KexManager.getOrWaitForConnectionInfo(KexManager.java:92)
at com.trilead.ssh2.transport.TransportManager.getConnectionInfo(TransportManager.java:231)
at com.trilead.ssh2.Connection.connect(Connection.java:769)
... 31 more
Caused by: java.io.IOException: Cannot negotiate, proposals do not match.
at com.trilead.ssh2.transport.KexManager.handleMessage(KexManager.java:413)
at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:765)
at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:480)
at java.base/java.lang.Thread.run(Thread.java:834)
Any ideas?
I checked on server-side what is going on. So I called tail -f /var/log/auth.log and discovered a message that the client and the server was not able to agree on matching kex-algorithms.
Finally my solution was to enable the line
KexAlgorithms +diffie-hellman-group1-sha1
to the corresponding
/etc/ssh/sshd_config
File and restart the sshd-server.
im trying to run one of the examples coming with the OPC-UA Java Stack and getting the following Exception.
Exception in thread "main" org.opcfoundation.ua.common.RuntimeServiceResultException: org.opcfoundation.ua.common.ServiceResultException: Bad_CertificateInvalid (code=0x80120000, description="2148663296, java.io.IOException: Duplicate extensions not allowed")
at org.opcfoundation.ua.transport.TransportChannelSettings.getServerCertificate(TransportChannelSettings.java:114)
at org.opcfoundation.ua.transport.tcp.io.TcpConnection.initialize(TcpConnection.java:376)
at org.opcfoundation.ua.transport.tcp.io.SecureChannelTcp.initialize(SecureChannelTcp.java:273)
at org.opcfoundation.ua.transport.tcp.io.SecureChannelTcp.initialize(SecureChannelTcp.java:246)
at org.opcfoundation.ua.application.Client.createSecureChannel(Client.java:640)
at org.opcfoundation.ua.application.Client.createSecureChannel(Client.java:555)
at org.opcfoundation.ua.application.Client.createSessionChannel(Client.java:370)
at org.opcfoundation.ua.application.Client.createSessionChannel(Client.java:345)
at org.opcfoundation.ua.examples.SampleClient.main(SampleClient.java:109)
Caused by: org.opcfoundation.ua.common.ServiceResultException: Bad_CertificateInvalid (code=0x80120000, description="2148663296, java.io.IOException: Duplicate extensions not allowed")
at org.opcfoundation.ua.transport.security.Cert.<init>(Cert.java:143)
at org.opcfoundation.ua.transport.TransportChannelSettings.getServerCertificate(TransportChannelSettings.java:112)
... 8 more
Caused by: java.security.cert.CertificateParsingException: java.io.IOException: Duplicate extensions not allowed
at sun.security.x509.X509CertInfo.<init>(X509CertInfo.java:169)
at sun.security.x509.X509CertImpl.parse(X509CertImpl.java:1804)
at sun.security.x509.X509CertImpl.<init>(X509CertImpl.java:195)
at sun.security.provider.X509Factory.engineGenerateCertificate(X509Factory.java:102)
at java.security.cert.CertificateFactory.generateCertificate(CertificateFactory.java:339)
at org.opcfoundation.ua.utils.CertificateUtils.decodeX509Certificate(CertificateUtils.java:193)
at org.opcfoundation.ua.transport.security.Cert.<init>(Cert.java:136)
... 9 more
Caused by: java.io.IOException: Duplicate extensions not allowed
at sun.security.x509.CertificateExtensions.parseExtension(CertificateExtensions.java:115)
at sun.security.x509.CertificateExtensions.init(CertificateExtensions.java:88)
at sun.security.x509.CertificateExtensions.<init>(CertificateExtensions.java:78)
at sun.security.x509.X509CertInfo.parse(X509CertInfo.java:702)
at sun.security.x509.X509CertInfo.<init>(X509CertInfo.java:167)
... 15 more
I tried running another client (c#) and it works.
Using redisTemplate, things work until redis server restart. Then two exceptions occur (see below).
When I stop using redisTemplate.executePipelined, the exception 2 stops happening. After invoking redisTemplate.setEnableTransactionSupport(false);, the exception 1 goes away.
But I have no idea why.
Any ideas?
Exception 1:
org.springframework.data.redis.RedisConnectionFailureException: java.net.SocketException: Broken pipe; nested exception is redis.clients.jedis.ex
ceptions.JedisConnectionException: java.net.SocketException: Broken pipe
at org.springframework.data.redis.connection.jedis.JedisExceptionConverter.convert(JedisExceptionConverter.java:67)
at org.springframework.data.redis.connection.jedis.JedisExceptionConverter.convert(JedisExceptionConverter.java:41)
at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:37)
at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:37)
at org.springframework.data.redis.connection.jedis.JedisConnection.convertJedisAccessException(JedisConnection.java:212)
at org.springframework.data.redis.connection.jedis.JedisConnection.hGetAll(JedisConnection.java:2890)
at org.springframework.data.redis.connection.DefaultStringRedisConnection.hGetAll(DefaultStringRedisConnection.java:360)
at org.springframework.data.redis.core.DefaultHashOperations$13.doInRedis(DefaultHashOperations.java:223)
at org.springframework.data.redis.core.DefaultHashOperations$13.doInRedis(DefaultHashOperations.java:220)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:204)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:166)
at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:88)
at org.springframework.data.redis.core.DefaultHashOperations.entries(DefaultHashOperations.java:220)
....
Caused by: redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketException: Broken pipe
at redis.clients.jedis.Protocol.sendCommand(Protocol.java:104)
at redis.clients.jedis.Protocol.sendCommand(Protocol.java:84)
at redis.clients.jedis.Connection.sendCommand(Connection.java:127)
at redis.clients.jedis.BinaryClient.hgetAll(BinaryClient.java:294)
at redis.clients.jedis.BinaryJedis.hgetAll(BinaryJedis.java:1027)
at org.springframework.data.redis.connection.jedis.JedisConnection.hGetAll(JedisConnection.java:2888)
... 14 common frames omitted
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at redis.clients.util.RedisOutputStream.flushBuffer(RedisOutputStream.java:52)
at redis.clients.util.RedisOutputStream.write(RedisOutputStream.java:59)
at redis.clients.jedis.Protocol.sendCommand(Protocol.java:90)
... 19 common frames omitted
Exception 2:
org.springframework.dao.InvalidDataAccessApiUsageException: Cannot use Jedis when in Pipeline. Please use Pipeline or reset jedis state .; nested
exception is redis.clients.jedis.exceptions.JedisDataException: Cannot use Jedis when in Pipeline. Please use Pipeline or reset jedis state .
at org.springframework.data.redis.connection.jedis.JedisExceptionConverter.convert(JedisExceptionConverter.java:64)
at org.springframework.data.redis.connection.jedis.JedisExceptionConverter.convert(JedisExceptionConverter.java:41)
at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:37)
at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:37)
at org.springframework.data.redis.connection.jedis.JedisConnection.convertJedisAccessException(JedisConnection.java:212)
at org.springframework.data.redis.connection.jedis.JedisConnection.hGetAll(JedisConnection.java:2890)
at org.springframework.data.redis.connection.DefaultStringRedisConnection.hGetAll(DefaultStringRedisConnection.java:360)
at org.springframework.data.redis.core.DefaultHashOperations$13.doInRedis(DefaultHashOperations.java:223)
at org.springframework.data.redis.core.DefaultHashOperations$13.doInRedis(DefaultHashOperations.java:220)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:204)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:166)
at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:88)
at org.springframework.data.redis.core.DefaultHashOperations.entries(DefaultHashOperations.java:220)
...
Caused by: redis.clients.jedis.exceptions.JedisDataException: Cannot use Jedis when in Pipeline. Please use Pipeline or reset jedis state .
at redis.clients.jedis.BinaryJedis.checkIsInMultiOrPipeline(BinaryJedis.java:1761)
at redis.clients.jedis.BinaryJedis.hgetAll(BinaryJedis.java:1026)
at org.springframework.data.redis.connection.jedis.JedisConnection.hGetAll(JedisConnection.java:2888)
... 14 common frames omitted
We are using Jersey Server-Sent Events (SSE) to allow remote components of our application to listen to events raised by our Jersey/Tomcat server. This works great.
However, it is crucial that our server have an accurate list of currently-connected listeners (our remote components). To this end, our server sends a tiny message to each caller (via eventOutput.write) once every five seconds. If our remote component is shut down while SSE-connected, or if the remote computer is powered off while SSE-connected, our server's eventOutput.write throws the ClientAbortException/SocketException exception shown below. That's perfect: we catch the exception, mark that caller as no longer connected, and move on.
Now, for the problem. As I mentioned, eventOutput.write throws an exception in cases where our remote component software is not running, or where the computer it runs on has been powered down. However, there are two cases where calling eventOutput.write to a no-longer-connected computer does NOT throw an exception: 1) if the Ethernet cable of the remote computer is simply pulled while the caller is SSE-connected, and 2) if the network adapter in the remote computer is turned off (i.e., by an administrative action) while the caller is SSE-connected. In these two cases, we can call eventOutput.write to the remote computer every five seconds for hours and no exception is thrown. This makes it impossible to detect that the remote computer is no longer connected.
I see that EventOutput (and ChunkedOutput) has very few methods and properties, but I wonder if there is any way to configure or use it that will cause an exception to be thrown when writing to a remote computer that has been disconnected by having its Ethernet cable pulled or network adapter turned off.
And here is the (good/useful) exception we get in cases where eventOutput.write DOES throw the exception we want:
org.apache.catalina.connector.ClientAbortException: null
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:371) ~[catalina.jar:7.0.53]
at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:333) ~[catalina.jar:7.0.53]
at org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:101) ~[catalina.jar:7.0.53]
at org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.flush(ResponseWriter.java:303) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.message.internal.CommittingOutputStream.flush(CommittingOutputStream.java:292) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.server.ChunkedOutput$1.call(ChunkedOutput.java:240) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.server.ChunkedOutput$1.call(ChunkedOutput.java:190) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.internal.Errors.process(Errors.java:315) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.internal.Errors.process(Errors.java:242) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:347) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.server.ChunkedOutput.flushQueue(ChunkedOutput.java:190) ~[jaxrs-ri-2.13.jar:2.13.]
at org.glassfish.jersey.server.ChunkedOutput.write(ChunkedOutput.java:180) ~[jaxrs-ri-2.13.jar:2.13.]
at com.appserver.webservice.AgentSsePollingManager$ConnectionChecker.run(AgentSsePollingManager.java:174) ~[AgentSsePollingManager$ConnectionChecker.class:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_71]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) [na:1.7.0_71]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) [na:1.7.0_71]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.7.0_71]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.7.0_71]
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) ~[na:1.7.0_71]
at java.net.SocketOutputStream.write(SocketOutputStream.java:159) ~[na:1.7.0_71]
at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:215) ~[tomcat-coyote.jar:7.0.53]
at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480) ~[tomcat-coyote.jar:7.0.53]
at org.apache.coyote.http11.InternalOutputBuffer.flush(InternalOutputBuffer.java:119) ~[tomcat-coyote.jar:7.0.53]
at org.apache.coyote.http11.AbstractHttp11Processor.action(AbstractHttp11Processor.java:799) ~[tomcat-coyote.jar:7.0.53]
at org.apache.coyote.Response.action(Response.java:174) ~[tomcat-coyote.jar:7.0.53]
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:366) ~[catalina.jar:7.0.53]
... 19 common frames omitted
I do not think it will be possible to account for all the possible failures by adding the code around SSE sockets even if Jersey adds all information accessible from the socket interface. The only viable solution is proper two way communication. In case of SSE chunked output stream, pulled cable does not cause any interruption because nothing is supposed to tell it that remote host is now unreachable (until OS closes the socket).
Your first step is right - implement heartbeats every N seconds. Then all you need to do is to report back with another tiny http call every so often that you still listen. It is up to you to do acknowledgments every 5 seconds or every minute - depends on how fast do you need a problem detection.
You can do it in the same Jersey Resource by implementing #POST (in RESTful terms it reads "create new ack that you receive events").
Note: browsers are good at re-establishing SSE connection on their own in case of network interrupts, no need to fiddle with it.
I'm loading a large number of nodes and relationships into an embedded Neo4j database. After about 10,000 inserts, it dies. If I stay under that point, then everything works great. Queries return as they should, as do inserts. It looks like somehow a database file is getting deleted in the middle of the inserts, which is causing everything to fall apart. My database builds itself from scratch, so if I completely delete my graphdb folder and restart it, it runs exactly the same every time. So how do you handle large embedded Neo4j databases?
Here are the pertinent errors.
From the Java output side
The transactions start not committing:
WorkerThread exception::org.neo4j.graphdb.TransactionFailureException::Unable to commit transaction
org.neo4j.graphdb.TransactionFailureException: Unable to commit transaction
at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:140)
...
at java.lang.Thread.run(Thread.java:745)
Caused by: org.neo4j.graphdb.TransactionFailureException: commit threw exception
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:500)
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:385)
at org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:123)
at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:124)
... 4 more
Caused by: javax.transaction.xa.XAException
Then it informs me that there's a missing file in the database:
saction.TransactionImpl.doCommit(TransactionImpl.java:560)
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:448)
... 7 more
Caused by: org.neo4j.kernel.impl.nioneo.store.UnderlyingStorageException: java.io.FileNotFoundException: /home/user/graphdb/schema/label/lucene/_1z6.frq (Protocol error)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.updateLabelScanStore(NeoStoreTransaction.java:814)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.applyCommit(NeoStoreTransaction.java:699)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.doCommit(NeoStoreTransaction.java:631)
at org.neo4j.kernel.impl.transaction.xaframework.XaTransaction.commit(XaTransaction.java:327)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commitWriteTx(XaResourceManager.java:632)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commit(XaResourceManager.java:533)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceHelpImpl.commit(XaResourceHelpImpl.java:64)
at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:548)
... 8 more
Caused by: java.io.FileNotFoundException: /home/user/graphdb/schema/label/lucene/_1z6.frq (Protocol error)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:441)
at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:306)
at org.apache.lucene.index.FormatPostingsDocsWriter.<init>(FormatPostingsDocsWriter.java:47)
at org.apache.lucene.index.FormatPostingsTermsWriter.<init>(FormatPostingsTermsWriter.java:33)
at org.apache.lucene.index.FormatPostingsFieldsWriter.<init>(FormatPostingsFieldsWriter.java:51)
at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:113)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:70)
at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:581)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3587)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3552)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:450)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:399)
at org.apache.lucene.index.DirectoryReader.doOpenFromWriter(DirectoryReader.java:413)
at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:432)
at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:375)
at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:508)
at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:109)
at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:57)
at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:137)
at org.neo4j.kernel.api.impl.index.LuceneLabelScanStore.refreshSearcher(LuceneLabelScanStore.java:159)
at org.neo4j.kernel.api.impl.index.LuceneLabelScanWriter.close(LuceneLabelScanWriter.java:82)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.updateLabelScanStore(NeoStoreTransaction.java:811)
... 15 more
Then I can no longer get a new transaction:
WorkerThread exception::org.neo4j.graphdb.TransactionFailureException::Unable to get transaction.
org.neo4j.graphdb.TransactionFailureException: Unable to get transaction.
at org.neo4j.kernel.InternalAbstractGraphDatabase.transactionRunning(InternalAbstractGraphDatabase.java:1064)
at org.neo4j.kernel.InternalAbstractGraphDatabase.beginTx(InternalAbstractGraphDatabase.java:1037)
at org.neo4j.kernel.TransactionBuilderImpl.begin(TransactionBuilderImpl.java:43)
at org.neo4j.kernel.InternalAbstractGraphDatabase.beginTx(InternalAbstractGraphDatabase.java:1024)
...
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.transaction.SystemException: Kernel has encountered some problem, please perform neccesary action (tx recovery/restart)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.neo4j.kernel.impl.transaction.KernelHealth.assertHealthy(KernelHealth.java:61)
at org.neo4j.kernel.impl.transaction.TxManager.assertTmOk(TxManager.java:339)
at org.neo4j.kernel.impl.transaction.TxManager.getTransaction(TxManager.java:725)
at org.neo4j.kernel.InternalAbstractGraphDatabase.transactionRunning(InternalAbstractGraphDatabase.java:1060)
... 7 more
Caused by: javax.transaction.xa.XAException
at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:560)
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:448)
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:385)
at org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:123)
at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:124)
... 4 more
Caused by: org.neo4j.kernel.impl.nioneo.store.UnderlyingStorageException: java.io.FileNotFoundException: /home/user/graphdb/schema/label/lucene/_1z6.frq (Protocol error)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.updateLabelScanStore(NeoStoreTransaction.java:814)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.applyCommit(NeoStoreTransaction.java:699)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.doCommit(NeoStoreTransaction.java:631)
at org.neo4j.kernel.impl.transaction.xaframework.XaTransaction.commit(XaTransaction.java:327)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commitWriteTx(XaResourceManager.java:632)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commit(XaResourceManager.java:533)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceHelpImpl.commit(XaResourceHelpImpl.java:64)
at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:548)
... 8 more
Caused by: java.io.FileNotFoundException: /home/user/graphdb/schema/label/lucene/_1z6.frq (Protocol error)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:441)
at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:306)
at org.apache.lucene.index.FormatPostingsDocsWriter.<init>(FormatPostingsDocsWriter.java:47)
at org.apache.lucene.index.FormatPostingsTermsWriter.<init>(FormatPostingsTermsWriter.java:33)
at org.apache.lucene.index.FormatPostingsFieldsWriter.<init>(FormatPostingsFieldsWriter.java:51)
at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:113)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:70)
at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:581)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3587)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3552)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:450)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:399)
at org.apache.lucene.index.DirectoryReader.doOpenFromWriter(DirectoryReader.java:413)
at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:432)
at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:375)
at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:508)
at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:109)
at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:57)
at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:137)
at org.neo4j.kernel.api.impl.index.LuceneLabelScanStore.refreshSearcher(LuceneLabelScanStore.java:159)
at org.neo4j.kernel.api.impl.index.LuceneLabelScanWriter.close(LuceneLabelScanWriter.java:82)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.updateLabelScanStore(NeoStoreTransaction.java:811)
... 15 more
From the messages.log side
It starts off getting memory mapping errors. This actually happens first when the database first comes online. But more trickle in before it totally dies:
2015-01-27 21:51:29.112+0000 ERROR [org.neo4j]: [/home/user/graphdb/neostore.nodestore.db] Unable to memory map Unable to map pos=0 recordSize=15 totalSize=1048575
org.neo4j.kernel.impl.nioneo.store.MappedMemException: Unable to map pos=0 recordSize=15 totalSize=1048575
at org.neo4j.kernel.impl.nioneo.store.MappedPersistenceWindow.<init>(MappedPersistenceWindow.java:59)
at org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.allocateNewWindow(PersistenceWindowPool.java:656)
at org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.expandBricks(PersistenceWindowPool.java:617)
at org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.acquire(PersistenceWindowPool.java:144)
at org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.acquireWindow(CommonAbstractStore.java:546)
at org.neo4j.kernel.impl.nioneo.store.NodeStore.forceGetRecord(NodeStore.java:149)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreIndexStoreView$NodeStoreScan.run(NeoStoreIndexStoreView.java:327)
at org.neo4j.kernel.impl.api.index.IndexPopulationJob.indexAllNodes(IndexPopulationJob.java:212)
at org.neo4j.kernel.impl.api.index.IndexPopulationJob.run(IndexPopulationJob.java:107)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Invalid argument
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:875)
at org.neo4j.kernel.impl.nioneo.store.StoreFileChannel.map(StoreFileChannel.java:57)
at org.neo4j.kernel.impl.nioneo.store.MappedPersistenceWindow.<init>(MappedPersistenceWindow.java:53)
... 13 more
Then, and I've figured out that this is when everything falls apart, I get this error in messages.log:
2015-01-27 21:59:41.516+0000 ERROR [org.neo4j]: setting TM not OK. Kernel has encountered some problem, please perform neccesary action (tx recovery/restart) null
javax.transaction.xa.XAException
at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:560)
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:448)
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:385)
at org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:123)
at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:124)
...
at java.lang.Thread.run(Thread.java:745)
Caused by: org.neo4j.kernel.impl.nioneo.store.UnderlyingStorageException: java.io.FileNotFoundException: /home/user/graphdb/schema/label/lucene/_1z6.frq (Protocol error)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.updateLabelScanStore(NeoStoreTransaction.java:814)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.applyCommit(NeoStoreTransaction.java:699)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.doCommit(NeoStoreTransaction.java:631)
at org.neo4j.kernel.impl.transaction.xaframework.XaTransaction.commit(XaTransaction.java:327)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commitWriteTx(XaResourceManager.java:632)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commit(XaResourceManager.java:533)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceHelpImpl.commit(XaResourceHelpImpl.java:64)
at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:548)
... 8 more
Caused by: java.io.FileNotFoundException: /home/user/graphdb/schema/label/lucene/_1z6.frq (Protocol error)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:441)
at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:306)
at org.apache.lucene.index.FormatPostingsDocsWriter.<init>(FormatPostingsDocsWriter.java:47)
at org.apache.lucene.index.FormatPostingsTermsWriter.<init>(FormatPostingsTermsWriter.java:33)
at org.apache.lucene.index.FormatPostingsFieldsWriter.<init>(FormatPostingsFieldsWriter.java:51)
at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:113)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:70)
at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:581)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3587)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3552)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:450)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:399)
at org.apache.lucene.index.DirectoryReader.doOpenFromWriter(DirectoryReader.java:413)
at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:432)
at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:375)
at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:508)
at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:109)
at org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:57)
at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:137)
at org.neo4j.kernel.api.impl.index.LuceneLabelScanStore.refreshSearcher(LuceneLabelScanStore.java:159)
at org.neo4j.kernel.api.impl.index.LuceneLabelScanWriter.close(LuceneLabelScanWriter.java:82)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.updateLabelScanStore(NeoStoreTransaction.java:811)
... 15 more
2015-01-27 21:59:41.519+0000 ERROR [org.neo4j]: TM error tx commit commit threw exception
Any ideas on what's causing that .frq file to disappear?
We resolved it in a side-conversation, maximum open files was too low (4000) which is also reported at startup.
That causes Lucene to break internally.
After increasing the limit the OP could import the data successfully.