SSTable loader streaming failed giving java.io.IOException: Connection reset by peer - java

I am trying to use sstableloader to stream data to a Cassandra database, which is in fact in the same node. It used to work when i was using DSE 2.2 but when i upgraded it to DSE 4.5 and made all the relevant changes in the cassandra.yaml file, it stopped working and now it is throwing an error like this:
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of demo/test_yale/demo-test_yale-jb-2-Data.db demo/test_yale/demo-test_yale-jb-1-Data.db to [/127.0.0.1]
Streaming session ID: 02225ef0-1c17-11e4-a1ea-5f2d4f6a32c1
progress: [/127.0.0.1 1/2 (88%)] [total: 88% - 2147483647MB/s (avg: 14MB/s)]ERROR 16:36:29,029 [Stream #02225ef0-1c17-11e4-a1ea-5f2d4f6a32c1] Streaming error occurred
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
at java.nio.channels.Channels.writeFully(Channels.java:98)
at java.nio.channels.Channels.access$000(Channels.java:61)
at java.nio.channels.Channels$1.write(Channels.java:174)
at com.ning.compress.lzf.LZFChunk.writeCompressedHeader(LZFChunk.java:77)
at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:132)
at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
at com.ning.compress.lzf.LZFOutputStream.write(LZFOutputStream.java:97)
at org.apache.cassandra.streaming.StreamWriter.write(StreamWriter.java:151)
at org.apache.cassandra.streaming.StreamWriter.write(StreamWriter.java:101)
at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:59)
at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42)
at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:383)
at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:363)
at java.lang.Thread.run(Thread.java:745)
WARN 16:36:29,032 [Stream #02225ef0-1c17-11e4-a1ea-5f2d4f6a32c1] Stream failed
Streaming to the following hosts failed:
[/127.0.0.1]
java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
I have even tried assigning actual ip address of the node for the listen_address, broadcast_address, and rpc_address in the cassandra.yaml file but the same error occurs.
Can anyone be of assistance please?

It's worth looking at your system.log as specified in cassandra/conf/logback.xml, as suggested by Zanson.
In my case the issue was simply with exhausting disk space on the node:
ERROR [STREAM-IN-/xx.xx.xx.xx] 2016-08-02 10:50:31,125 StreamSession.java:505 - [Stream #8420bfa0-589c-11e6-9512-235b1f79cf1b] Streaming error occurred
java.io.IOException: No space left on device
at java.io.RandomAccessFile.writeBytes(Native Method) ~[na:1.8.0_101]
at java.io.RandomAccessFile.write(RandomAccessFile.java:525) ~[na:1.8.0_101]

Related

HDFS: namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(390)) - Error: flush failed for required journal

On one of our platforms, HDFS namenode is shutting down with following error message every 1 or 3 days
FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(390)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [<ip1>:<port>,<ip2>:<port>, etc], stream=QuorumOutputStream starting at txid 29873171))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:109)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:525)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:385)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:521)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:710)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:188)
at java.lang.Thread.run(Thread.java:748)
Before this FATAL log we can see following kind of logs, on which we can detect a degradation of response time
WARN client.QuorumJournalManager (QuorumCall.java:waitFor(185)) - Waited 18014 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [<ip1>:<port>,<ip2>:<port>]
Have you already encountered this problem, and do you have any advices to fix it ?
We have already:
checked that our VMs are time synchronized
detected that when the problem occurs a burst of data on the network is in progress, without detecting the root cause yet
checked our network devices. Except a problem on a port which goes from UP state to DOWN state quickly that we are going to fix, the network seems correct
Thanks in advance

java.net.BindException: Address already in use: connect (I did read the others before posting)

I know there are a ton of these posts, but this is a little different. We are using vended code for part of our data processing system, and part of the system sends emails to clients if certain events take place on data insertion or deletion. Recently we have started getting address already in use exceptions. We checked the repository history, and nothing has changed in our code in the last 6 months for this system. We have already tried the typical solutions for this issue including increasing the number of connections allowed to the port with little success. We had a meeting with the vendor, and I asked if anything had changed in their code, and if they would assure that all connections in their code are explicitly closed. They indicated that they are explicitly closing all sockets. However they didn't show us the code so there is no way for us to know if this is true other than taking their word for it. So, the only thing I can think of to do is continue to increase the number of connections to the port until we stop getting bind exceptions. So, what is the industry standard for max number of connections to port 25; is there one? Also if anyone has any other suggestions I would greatly appreciate it? Thanks so much in advance, Robert
20210505112127.716 ERROR m.fiserv.ppx.business.notification.EmailNotifier : MessagingException from notify
javax.mail.MessagingException: Could not connect to SMTP host: SERVER.URL.COM, port: 25;
nested exception is:
java.net.BindException: Address already in use: connect
at com.sun.mail.smtp.SMTPTransport.openServer(SMTPTransport.java:1545)
at com.sun.mail.smtp.SMTPTransport.protocolConnect(SMTPTransport.java:453)
Caused by:
java.net.BindException: Address already in use: connect
at java.net.DualStackPlainSocketImpl.connect0(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:90)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:380)
20210505131529.950 ERROR erv.ppx.web.controller.AuditReportViewController : Error while generating HTML
net.sf.jasperreports.engine.JRException: Error writing to OutputStream writer : CorpAdminAuditReport
at net.sf.jasperreports.engine.export.JRHtmlExporter.exportReport(JRHtmlExporter.java:496)
at com.fiserv.ppx.web.controller.AuditReportViewController.generateReport(AuditReportViewController.java:184)
Caused by:
com.ibm.wsspi.webcontainer.ClosedConnectionException: OutputStream encountered error during write
at com.ibm.ws.webcontainer.channel.WCCByteBufferOutputStream.write(WCCByteBufferOutputStream.java:188)
at com.ibm.ws.webcontainer.srt.SRTOutputStream.write(SRTOutputStream.java:97)
20210505140706.240 ERROR com.fiserv.ppx.business.db.DBConnectionUtil : Exception in getting for AppServer connection from DataSource.
com.ibm.websphere.ce.cm.ConnectionWaitTimeoutException: J2CA1010E: Connection not available; timed out waiting for 180,005 seconds.
at com.ibm.ws.rsadapter.AdapterUtil.toSQLException(AdapterUtil.java:1680)
at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDataSource.java:661)
at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDataSource.java:611)
Caused by:
com.ibm.websphere.ce.j2c.ConnectionWaitTimeoutException: J2CA1010E: Connection not available; timed out waiting for 180,005 seconds.
at com.ibm.ejs.j2c.FreePool.createOrWaitForConnection(FreePool.java:1781)
at com.ibm.ejs.j2c.PoolManager.reserve(PoolManager.java:3834)
at com.ibm.ejs.j2c.PoolManager.reserve(PoolManager.java:3082)
20210505140731.341 ERROR com.fiserv.ppx.business.db.DBConnectionUtil : Exception in getting for AppServer connection from DataSource.
com.ibm.websphere.ce.cm.ConnectionWaitTimeoutException: J2CA1010E: Connection not available; timed out waiting for 180,010 seconds.
at com.ibm.ws.rsadapter.AdapterUtil.toSQLException(AdapterUtil.java:1680)
at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDataSource.java:661)
at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDataSource.java:611)
Caused by:
com.ibm.websphere.ce.j2c.ConnectionWaitTimeoutException: J2CA1010E: Connection not available; timed out waiting for 180,010 seconds.
at com.ibm.ejs.j2c.FreePool.createOrWaitForConnection(FreePool.java:1781)
at com.ibm.ejs.j2c.PoolManager.reserve(PoolManager.java:3904)
at com.ibm.ejs.j2c.PoolManager.reserve(PoolManager.java:3082)
20210505140731.341 ERROR com.fiserv.ppx.sso.controller.SSOController : SSO Configuration error
java.lang.NullPointerException
at com.fiserv.ppx.business.db.PPXDbTransactionManager.<init>(PPXDbTransactionManager.java:60)
at com.fiserv.ppx.sso.impl.SSOLoginAuthenticator.authenticateSSOUser(SSOLoginAuthenticator.java:157)

Connection to Postgres database lost

On a performance server - with rather a big load, i have a weird behavior.
From one moment in time all the connection the database start to say "connection has been closed".
The only hint so far is this IOException :
Caused by: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:314)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:430)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:356)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:168)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:116)
at org.jboss.resource.adapter.jdbc.WrappedPreparedStatement.executeQuery(WrappedPreparedStatement.java:342)
at org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:208)
at org.hibernate.loader.Loader.getResultSet(Loader.java:1812)
at org.hibernate.loader.Loader.doQuery(Loader.java:697)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:259)
at org.hibernate.loader.Loader.doList(Loader.java:2232)
... 73 more
Caused by: java.io.IOException: Tried to send an out-of-range integer as a 2-byte value: 33001
at org.postgresql.core.PGStream.sendInteger2(PGStream.java:211)
at org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1409)
at org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1729)
at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1294)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:280)
... 83 more
However i can't really link it to some business scenario for the moment.
Any ideas ?
It's a PostgreSQL driver limitation, the maximum number of parameters for a query is 32768.
You have a query that exceeds that limit - and by doing so the driver has an erratic behavior of closing connections. I encountered that on a JBoss server using Hibernate with PostgreSQL and the connection closing led to a pretty messed up state of the connection pool.
This parameter is described here, in the Parse section:
"Int16 - The number of parameter data types specified".
The solution is to split that long query into smaller ones with a known number of parameters.

Timeout while writing to the queue-based output stream

My GAE application works fine but at some point it throws the error below. I am trying to figure out what could be the root cause of a Timeout while writing to the queue-based output stream error:
java.io.IOException: Timeout while writing to the queue-based output stream
org.restlet.engine.io.PipeStream$2.write(PipeStream.java:99)
java.io.OutputStream.write(OutputStream.java:116)
com.dropbox.core.util.IOUtil.copyStreamToStream(IOUtil.java:52)
com.dropbox.core.util.IOUtil.copyStreamToStream(IOUtil.java:63)
com.dropbox.core.util.IOUtil.copyStreamToStream(IOUtil.java:34)
com.dropbox.core.v1.DbxClientV1$Downloader.copyBodyAndClose(DbxClientV1.java:535)
com.dropbox.core.v1.DbxClientV1.getFile(DbxClientV1.java:427)
com.myapp.MyServerResource.getFile(MyServerResource.java:268)
com.myapp.MyServerResource$1.write(MyServerResource.java:140)
org.restlet.engine.io.IoUtils$2.run(IoUtils.java:537)
org.restlet.engine.Engine$1.run(Engine.java:158)
com.google.appengine.tools.development.RequestThreadFactory$1$1$2.run(RequestThreadFactory.java:110)
java.security.AccessController.doPrivileged(Native Method)
com.google.appengine.tools.development.RequestThreadFactory$1$1.run(RequestThreadFactory.java:107)
Is this really timeout issue with the URL fetch or an issue with output stream copy?

com.mongodb.MongoException$Network Caused by: java.net.SocketTimeoutException: Read timed out

When I insert documents into Mongo-DB using morphia, it always occur com.mongodb.MongoException$Network: Write operation to server exceptions, maybe interval of one minute
The follow is the stack info:
com.mongodb.MongoException$Network: Write operation to server
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:153)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:115)
at com.mongodb.DBApiLayer$MyCollection.update(DBApiLayer.java:327)
at com.mongodb.DBCollection.update(DBCollection.java:178)
at com.mongodb.DBCollection.save(DBCollection.java:818)
at com.google.code.morphia.DatastoreImpl.save(DatastoreImpl.java:882)
at com.google.code.morphia.DatastoreImpl.save(DatastoreImpl.java:949)
at com.google.code.morphia.DatastoreImpl.save(DatastoreImpl.java:934)
at com.yeahmobi.datasystem.conversion.datarepository.mongodb.MongoTransmappingRepository.insert(MongoTransmappingRepository.java:36)
at com.yeahmobi.datasystem.conversion.datarepository.merge.MergeTransmappingRepository.insert(MergeTransmappingRepository.java:24)
at com.yeahmobi.datasystem.conversion.threads.JumpInserter.saveJumpLog(JumpInserter.java:134)
at com.yeahmobi.datasystem.conversion.threads.JumpInserter.run(JumpInserter.java:163)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at org.bson.io.Bits.readFully(Bits.java:46)
at org.bson.io.Bits.readFully(Bits.java:33)
at org.bson.io.Bits.readFully(Bits.java:28)
at com.mongodb.Response.<init>(Response.java:40)
at com.mongodb.DBPort.go(DBPort.java:142)
at com.mongodb.DBPort.go(DBPort.java:106)
at com.mongodb.DBPort.findOne(DBPort.java:162)
at com.mongodb.DBPort.runCommand(DBPort.java:170)
at com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:100)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:142)
Is there anyone meet the same issues? Any sugguestion is appreciated.
Thanks
This is unusual and should not happen often during normal operation.
Try to debug from networking/OS perspective, check the following:
Is the connectivity between application and Mongo reliable? What's the packet drop rate and latency?
Is there enough network bandwidth between application and Mongo?
Has there been any software/hardware trouble on the application/Mongo server?
Was the server on high load when it happens?

Categories