Spring Mongo's find operation freezes on windows server 2008 Machine

Spring Mongo's find operation freezes on windows server 2008 Machine - java

I am currently using Spring's Mongo persistence layer for querying MongoDB. The collection I query contains about 4G of data. When I run the find code on my IDE it retrieves the data. However, when I run the same code on my server, it freezes for about 15 to 20 minutes and eventually throws the error below. My concern is that it runs without a hitch on my IDE running on my 4G Ram windows PC and fails on my 14G ram server. I have looked through the Mongo Log, and there's nothing there that points to the problem. I also assumed that the problem might be an environmental issue since it works on my local spring IDE, however the libraries on both my local pc are the same as the ones on my server. Has anyone had this kind of issue or can any one point me to what I'm doing wrong. Also weirdly, the find operation works when I revert to Mongo's java driver find methods.
I'm using mongo-java-driver - 2.12.1
spring-data-mongodb - 1.7.0.RELEASE
See below sample find operation code and error message.
List<HTObject> empObjects =mongoOperations.find(new Query(Criteria.where("date").gte(dateS).lte(dateE)),HTObject.class);
The exception I get is:
09:42:01.436 [main] DEBUG o.s.data.mongodb.core.MongoDbUtils - Getting Mongo Database name=[Hansard]
Exception in thread "main" org.springframework.dao.DataAccessResourceFailureException: Cursor 185020098546 not found on server 172.30.128.155:27017; nested exception is com.mongodb.MongoException$CursorNotFound: Cursor 185020098546 not found on server 172.30.128.155:27017
at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:73)
at org.springframework.data.mongodb.core.MongoTemplate.potentiallyConvertRuntimeException(MongoTemplate.java:2002)
at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1885)
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1696)
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTempate.java:1679)
at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:598)
at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:589)
at com.sa.dbObject.TestDb.main(TestDb.java:74)
Caused by: com.mongodb.MongoException$CursorNotFound: Cursor 185020098546 not found on server 172.30.128.155:27017
at com.mongodb.QueryResultIterator.throwOnQueryFailure(QueryResultIterator.java:218)
at com.mongodb.QueryResultIterator.init(QueryResultIterator.java:198)
at com.mongodb.QueryResultIterator.initFromQueryResponse(QueryResultIterator.java:176)
at com.mongodb.QueryResultIterator.getMore(QueryResultIterator.java:141)
at com.mongodb.QueryResultIterator.hasNext(QueryResultIterator.java:127)
at com.mongodb.DBCursor._hasNext(DBCursor.java:551)
at com.mongodb.DBCursor.hasNext(DBCursor.java:571)
at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1871)
... 5 more

In short
The MongoDB result cursor is not available anymore on the server.
Explanation
This can happen when using Sharding and a connection to a mongos fails over or if you run into timeouts (see http://docs.mongodb.org/manual/core/cursors/#closure-of-inactive-cursors).
You're performing a query that loads all objects into one list (mongoOperations.find). Depending on the result size, this may take a long time. Using an Iterator can help to leverage but even loading huge amounts using Iterators is limited at a certain point.
You should partition the results if you have to query very large data amounts using either paging (paging gets slower the more records you skip) or by querying with splits of your range (you have already a date range, so this could work).

Related

Document created even after NoHostAvailableException

I am trying to execute multiple BatchSatements in parallel with ExecutorService of Java. I want to know whether my query is successfully executed.
I have gone through:
how do I find out if the update query was successful or not in Cassandra Datastax
It's saying if no exception is there, we can consider it successful. But I am getting NoHostAvailableException.
All host(s) tried for query failed (tried: *********************(com.datastax.driver.core.exceptions.OperationTimedOutException: [******************] Timed out waiting for server response))
But I can see my data in Cassandra. I want to know how can I know if my document is created successfully in Cassandra. Is there any way for it?

Batches are different in Cassandra in comparison to the relational databases. And they should be used only in limited number of the use cases, and they shouldn't be used for batching of inserts/updates to multiple partions, until it's really necessary (see "misuse of batches" doc).
Batch will be eventually replayed, even if you got an error back to the driver - this happens because batch is replicated to other nodes before execution. See following diagram for details.

Microsoft SQL Server - Query with huge results, causes ASYNC_NETWORK_IO wait issue

I have a Java application requesting about 2.4 million records from a Microsoft SQL Server (Microsoft SQL Server 2008 R2 (SP3))
The application runs fine on all hosts, except one. On this host, the application is able to retrieve data on some occasions. But on some others, it hangs.
Monitoring the MS Sql server indicates that the SPID associated with the query is in an ASYNC_NETWORK_IO wait state.
There are a few links online that talk about it.
https://blogs.msdn.microsoft.com/joesack/2009/01/08/troubleshooting-async_network_io-networkio/
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/6db233d5-8892-4f8a-88c7-b72d0fc59ca9/very-high-asyncnetworkio?forum=sqldatabaseengine
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/1df2cab8-33ca-4870-9daf-ed333a64630c/network-packet-size-and-delay-by-sql-server-sending-data-to-client?forum=sqldatabaseengine
Based on the above, the ASYNC_NETWORK_IO means 2 things:
1. Application is slow to process the results
2. Network between application and DB has some issues.
For #1 above, We analyzed using tcpdumps and found that in the cases where the query goes into ASYNC_NETWORK_IO state, the application server's tcp connection has a window size that oscillates between 0 and a small number, and eventually remains stuck at 0. Based on some more analysis, aspects related to firewalls between DB and application have also been mostly ruled out.
So I am staring at #2, unable to understand what could possibly go wrong. All the more baffling because the same code has been running under similar data loads for more than a year now. And it also runs fine on other hosts.
The JDBC driver being used is sqljdbc4-4.0.jar.
This by default has an adaptive buffering feature, which does things under the hood to reduce application resources.
We use the default fetch size of 128 (which i believe is not a good one).
So i am going to experiment overriding the default adaptive buffering behavior, though the MS docs suggest that it is good to have adaptive buffering for large result sets.
I will change the connection setting to use selectMethod=cursor.
And also change the fetchSize to 1024.
Now if it does not work:
What are some aspects of the problem that are worth investigating.
Assuming its still an issue with the client, what other connection settings, network settings should be inspected/changed to make progress?
If it does work consistently, what is the impact of making the connection setting change to selectMethod=cursor
On the application side?
Database side?
Update: I tested the application adding the selectMethod=cursor to the connection. However, it results in the same issue as above.
Based on discussions with other administrators in the team - at this point the issue may be in the jdbc driver, or on the OS (when it tries to handle the data on the network).

After a good amount of discussions with the System admin, Network Admin and Database admin - it was agreed that somewhere in the OS -> Application stack, the data from network wasn't handled. In the meantime, we tested out a solution where we broke down the query to return smaller sized results. So we broke it down to 5 queries, each returning about 500k records.
Now when we ran these queries sequentially, we still ran into the same issue.
However, when we ran the queries in parallel, it always was successful.
Given that the solution worked always we haven't bothered getting to the root cause of the problem anymore.
On another note, the hardware and software running the application was also outdated. It was running Red Hat 5. So, it could well have to do something with that.

Getting "Write attempt on defunct connection" Error From Datastax Cassandra Java Driver

I have a web service application using Cassandra 2.0 and Datastax java driver 2.0.2. I sometimes get the stacktrace below when trying to write to/read from database, especially if the application has been sitting there for a while (like overnight). This error usually goes away when I retry, however, sometimes it persists and I have to restart the web app to get rid of the error.
I wonder if this is some sort of "stale connection" issue. However, the Datastax java driver documentation indicates it is supposed to keep the connection alive.
I did a google search on the error message and only two (!) hits were given by google. They are related. This is the answer in one of the google result:
Sylvain Lebresne Apr 2 You're running into
https://datastax-oss.atlassian.net/browse/JAVA-250. We'll fix it soon
hopefully (I have some half-finished patch that I need to finish), but
currently, if you restart a whole cluster without doing queries during
the restat, it can sometimes happen that you'll get this before the
cluster properly reconnect. In the meantime and as a workaround, you
can always make sure to run a few trivial queries while you're doing
the cluster restart to avoid it.
However this does not look like my scenario because we are not restarting the cluster at all. I wonder if anyone has some insights about this error?
Stacktrace:
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ec2-54-197-xxx-xxx.compute-1.amazonaws.com/54.197.xxx.xxx:9042 (com.datastax.driver.core.ConnectionException: [ec2-54-197-xxx-xxx.compute-1.amazonaws.com/54.197.xxx.xxx:9042] Write attempt on defunct connection))
at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
at com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)

I have what I believe is the exact same issue (Write attempt on defunct connection) on my development machine intermittently.
It seems to happen when my dev machine goes to sleep while the server is up. Obviously there's no power management in the AWS cluster you're running, but it gives you a hint - the key is that something is breaking your control connection or intermittently preventing network connectivity between your hosts.
You should see the reconnection thread in your logs:
21:34:51.616 [Reconnection-1] ERROR c.d.driver.core.ControlConnection - [Control connection] Cannot connect to any host, scheduling retry in 2000 milliseconds
The next request after this will always succeed in my experience.
TL; DR - check for networking issues or any intermittent shutdown of servers that could break the control connection. The driver should do a better job of re-establishing broken control connections, sounds like they're working on it for JAVA-250

Disable locking of Neo4j graph database?

My application is populating a Neo4j graph database at /tmp/import.db. In addition to my unit tests I like to use the Neo4j browser (AKA Neo4j Community) to do some digging in that same database. When the browser is running, my application crashes when it gets run because the database it is locked:
Exception in thread "main" java.lang.RuntimeException: Error starting org.neo4j.kernel.EmbeddedGraphDatabase, /tmp/import.db
at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:330)
at org.neo4j.kernel.EmbeddedGraphDatabase.<init>(EmbeddedGraphDatabase.java:63)
at org.neo4j.graphdb.factory.GraphDatabaseFactory$1.newDatabase(GraphDatabaseFactory.java:92)
at org.neo4j.graphdb.factory.GraphDatabaseBuilder.newGraphDatabase(GraphDatabaseBuilder.java:198)
at org.neo4j.graphdb.factory.GraphDatabaseFactory.newEmbeddedDatabase(GraphDatabaseFactory.java:69)
at no.marcello.cmdb.Import.<init>(Import.java:34)
at no.marcello.cmdb.Main.main(Main.java:10)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.StoreLockerLifecycleAdapter#5d20e46' was successfully initialized, but failed to start. Please see attached cause exception.
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:509)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:115)
at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:307)
... 6 more
Caused by: org.neo4j.kernel.StoreLockException: Unable to obtain lock on store lock file: /tmp/import.db/store_lock. Please ensure no other process is using this database, and that the directory is writable (required even for read-only access)
at org.neo4j.kernel.StoreLocker.checkLock(StoreLocker.java:82)
at org.neo4j.kernel.StoreLockerLifecycleAdapter.start(StoreLockerLifecycleAdapter.java:44)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:503)
... 8 more
Caused by: java.io.IOException: Unable to lock sun.nio.ch.FileChannelImpl#70b0b186
at org.neo4j.kernel.impl.nioneo.store.FileLock.wrapFileChannelLock(FileLock.java:38)
at org.neo4j.kernel.impl.nioneo.store.FileLock.getOsSpecificFileLock(FileLock.java:93)
at org.neo4j.kernel.DefaultFileSystemAbstraction.tryLock(DefaultFileSystemAbstraction.java:89)
at org.neo4j.kernel.StoreLocker.checkLock(StoreLocker.java:74)
... 10 more
Now I have to neo4j stop and neo4j start between every run of my application to see the changes. My hands got tired of that.
Can I disable locking of the database when using the Neo4j browser? I'd like to do that for testing purposes, as it helps alot to see how my database model evolves while I'm populating it.

Database systems -- small ones, anyway -- can often run in either of two modes: embedded or server. In embedded mode, the idea is that one program and only one program can read and write to the database at a time. This is quite useful for many applications, and allows the database to dispense with the code necessary to allow access among multiple programs, which eats up time, code, and processing power.
In server mode, the database management system itself runs as a separate program, and it is built to have multiple programs access it.
Based on the class in the error message above, you have an embedded database, so the answer to your question is "no, you can't do that in this mode". You can switch to using a server mode of neo4j, I expect, but connecting to it will involve some code changes, and you then have the minor problems of making sure your database system is running when your program runs, etc.
So you can do it with this database data, but you have to change the mode in which you are running the database management system.

SQLException: Protocol violation. Oracle JDBC Driver issue

I'm getting the following excpetion:
java.sql.SQLException: Protocol violation
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:145)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:190)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:286)
at oracle.jdbc.driver.T4C80all.receive(T4C80all.java:766)
at oracle.jdbc.driver.T4CPreparedStatement.do0all8(T4CPreparedStatement.java:216)
at oracle.jdbc.driver.T4CPreparedStatement.fetch(T4CPreparedStatement.java:1225)
at oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next(OracleResultSetImpl.java:373)
at oracle.jdbc.driver.OracleResultSetImpl.next(OracleResultSetImpl.java:284)
The Oracle system is running 10.2.0.3.0 on Solaris 5.10. The jdbc driver is running on JDK 1.6.0_21 (if it's import the java is running on a Solaris 5.10 machine as well). I've tried several different oracle thin drivers including the latest and the one that appears to exactly match the oracle version.
The query I'm running is fairly simple: "select * from some_table order by key1, key2, key3" Then iterating through the result set and writing to a file. The table has around 12 million rows, so I expect the process is running long, but it seems to die within 5-15 mins into it. Each time I run it, it blows up on a different row, so I don't think the problem is with the data.
I found the oracle alert log but I couldn't tell that anything in there was related to my process. Still, I'm no oracle expert and perhaps there's an oracle setting I need to look at. Strangely enough, I'm running about five of these type of queries (a couple are a bit more complicated) on different connections and only two simplest ones ever get this problem.
Any help or ideas on what to look at to narrow down the problem would be appreciated.

For future googlers who are have to this page, here is the problem we had .
The protocol violation exception was being logged on application logs and Oracle trace.
Oracle trace
This is error from oracle trace files
--- PROTOCOL VIOLATION DETECTED ---
----- Dump Cursor sql_id=1j5kjnkncpp xsc=0x2a053a2a0 cur=0x2a052f1cf0 ---
----- Current SQL Statement for this session (sql_id=1jjns4k6npp) -----
select xyz
From Application Logs
Caused by: org.springframework.jdbc.UncategorizedSQLException: SqlMapClient operation; uncategorized SQLException for SQL []; SQL state [72000]; error code [20000];
Symptom
This exception was happening occasionally. The stack trace had different sql in it which was very confusing. Running the sql with sql plus worked fine.
Root Cause
The exception was thrown when oracle driver was trying to export a CLOB data. This was happening with only few records, not all of them. The data as such was a file. Visually we could not make out what was wrong with that data.
Why we were seeing errors in oracle logs ?
So if this was a driver defect, why did we see the error in oracle trace ? Logically the driver errors should be only confined to application logs.
The reasons was that when protocol violation happened, the connection got corrupted. This connection was returned to the connection pool. Any user or job when will use that connection would not work and would experience error.
That is why it will happen at random places, with random users
Solution
A short term fix was to change this property in connection pool. We are using DBCP connection pool.
Changed from
ds.setTestOnBorrow(false);
to
ds.setTestOnBorrow(true);
Now when the pool returns a corrupted connection to the pool, before app borrows this connection , it would test for validity. If connection is unusable, pool would discard and then app gets a new/valid connection.
If you enable connection pool logs, you should see the exception which normally is swallowed.
Driver Upgrade
Upgrade to OJDBC 12.1.0.2 from OJDBC 12.1.0.1 solved the problem, even for the problematic rows.
Some other links for reference
https://confluence.atlassian.com/display/CONFKB/java.sql.SQLException%3A+Protocol+violation+caught+while+accessing+a+page+and+Oracle+DB+is+used

Apparently adding -d64 to the java command line fixes this problem. Looks like a Solaris 64-bit issue.

In my case, using tomcat 8.5 (standard connection pool), oracle 19, when oracle emits an warning message (like 'your password will expire in n days'), the java connection object interprets like an error.
I just change the PASSWORD, by the same name, and the problem was solved.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.