I am trying to execute multiple BatchSatements in parallel with ExecutorService of Java. I want to know whether my query is successfully executed.
I have gone through:
how do I find out if the update query was successful or not in Cassandra Datastax
It's saying if no exception is there, we can consider it successful. But I am getting NoHostAvailableException.
All host(s) tried for query failed (tried: *********************(com.datastax.driver.core.exceptions.OperationTimedOutException: [******************] Timed out waiting for server response))
But I can see my data in Cassandra. I want to know how can I know if my document is created successfully in Cassandra. Is there any way for it?
Batches are different in Cassandra in comparison to the relational databases. And they should be used only in limited number of the use cases, and they shouldn't be used for batching of inserts/updates to multiple partions, until it's really necessary (see "misuse of batches" doc).
Batch will be eventually replayed, even if you got an error back to the driver - this happens because batch is replicated to other nodes before execution. See following diagram for details.
Related
I am currently using Spring's Mongo persistence layer for querying MongoDB. The collection I query contains about 4G of data. When I run the find code on my IDE it retrieves the data. However, when I run the same code on my server, it freezes for about 15 to 20 minutes and eventually throws the error below. My concern is that it runs without a hitch on my IDE running on my 4G Ram windows PC and fails on my 14G ram server. I have looked through the Mongo Log, and there's nothing there that points to the problem. I also assumed that the problem might be an environmental issue since it works on my local spring IDE, however the libraries on both my local pc are the same as the ones on my server. Has anyone had this kind of issue or can any one point me to what I'm doing wrong. Also weirdly, the find operation works when I revert to Mongo's java driver find methods.
I'm using mongo-java-driver - 2.12.1
spring-data-mongodb - 1.7.0.RELEASE
See below sample find operation code and error message.
List<HTObject> empObjects =mongoOperations.find(new Query(Criteria.where("date").gte(dateS).lte(dateE)),HTObject.class);
The exception I get is:
09:42:01.436 [main] DEBUG o.s.data.mongodb.core.MongoDbUtils - Getting Mongo Database name=[Hansard]
Exception in thread "main" org.springframework.dao.DataAccessResourceFailureException: Cursor 185020098546 not found on server 172.30.128.155:27017; nested exception is com.mongodb.MongoException$CursorNotFound: Cursor 185020098546 not found on server 172.30.128.155:27017
at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:73)
at org.springframework.data.mongodb.core.MongoTemplate.potentiallyConvertRuntimeException(MongoTemplate.java:2002)
at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1885)
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1696)
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTempate.java:1679)
at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:598)
at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:589)
at com.sa.dbObject.TestDb.main(TestDb.java:74)
Caused by: com.mongodb.MongoException$CursorNotFound: Cursor 185020098546 not found on server 172.30.128.155:27017
at com.mongodb.QueryResultIterator.throwOnQueryFailure(QueryResultIterator.java:218)
at com.mongodb.QueryResultIterator.init(QueryResultIterator.java:198)
at com.mongodb.QueryResultIterator.initFromQueryResponse(QueryResultIterator.java:176)
at com.mongodb.QueryResultIterator.getMore(QueryResultIterator.java:141)
at com.mongodb.QueryResultIterator.hasNext(QueryResultIterator.java:127)
at com.mongodb.DBCursor._hasNext(DBCursor.java:551)
at com.mongodb.DBCursor.hasNext(DBCursor.java:571)
at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1871)
... 5 more
In short
The MongoDB result cursor is not available anymore on the server.
Explanation
This can happen when using Sharding and a connection to a mongos fails over or if you run into timeouts (see http://docs.mongodb.org/manual/core/cursors/#closure-of-inactive-cursors).
You're performing a query that loads all objects into one list (mongoOperations.find). Depending on the result size, this may take a long time. Using an Iterator can help to leverage but even loading huge amounts using Iterators is limited at a certain point.
You should partition the results if you have to query very large data amounts using either paging (paging gets slower the more records you skip) or by querying with splits of your range (you have already a date range, so this could work).
The web application allows to request huge amount of information and to cope with the load and timeout I have JMS. Every time data export functionality is called it is routed via JMS and web query is released.
Onward, JMS calls Oracle stored procedure and it takes for while - 5-10 min. to execute it. My initial thought was that the call is asynchronous, because the query is released. However, Weblogic JMS has 15s timeout value for database connection. So, after while it kills the connection because there is no data in the pipe (Oracle stored procedure is busy to pull necessary data).
So far I found the following solutions:
Increase timeout. The support at data center were not very happy and pointed out, that there is a problem with app design. The point is, that the query has to be asynchronous on all layers, including JMS->Oracle.
Make the stored procedure as a job and close JMS->Oracle connection once the call is initiated. The problem with this approach is that I need to ping Oracle DB constantly in order to find out when the job is completed.
The same as second, but to try callback JMS. However, short reading gave my impression, that such solution isn't very popular because it won't be general (hard-coded values, etc).
What would be your suggestions? Thank you in advance.
If the stored procedure can not be optimized, then I would simply extend the timeout. I think the timeout on JMS can be overridden by some Java annotation, for a particular task. So you do not even have to modify global Weblogic setting.
You course there are ways how to call the procedure anachronistically. Either by using AQ (are you using Advanced queuing and an JMS provider) or by by submitting it as an scheduler job. But there is a risk that you can kill the database if you submit to many jobs to be executed in parallel. Both DBMS_JOB and (newer preferred) DBMS_SCHEDULER_JOB have ways how to restrict the number of concurrent sessions running. So instead of calling the stored procedure directly you will call another wrapper procedure, which will submit a single-shot non-repeating DBMS_SCHEDURER_JOB. Then the scheduler will execute your procedure as a scheduler task. But this solution you have to negotiate with your DBAs. There is a view in Oracle where you can check status
You you were using Oracle Advanced Queuing, you could also submit a job into the Oracle AQ. And then having a single PL/SQL stored procedure (running as "infinite" scheduler job), which will withdraw messages from AQ one by one, and executing the desired stored procedure. When the procedure finishes it can submit it's response into another AQ. And since AQ can act as JMS provider on Weblogic your application will be notified via JMS about procedure finish.
I have created an external table via hive from hbase. When someone loged in to shell and deploy some queries , no one else can run any query. when someone try to run a quesry it gives following error.
FAILED: Error in metadata: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
when first shell exits then queries can be run easily. So it means that my hive a cannot handle multiple clients. How to enables it to handle multiple clients
If you have not configured your Hive Metastore, its not possible for more than one user to access the Hive Server.
Embedded Metastore An embedded metastore is mainly used for unit tests. Only one process can connect to the metastore at a time, so it
is not really a practical solution but works well for unit tests
Check this link for more information.
I created a job which uses reader of type
org.springframework.batch.item.database.HibernateCursorItemReader to execute a query.
The problem is database connection in this case is hitting connection limit (I have a oracle error ORA-12519, TNS:no appropriate service handler found) and, surprisingly, I noticed exit_code=EXECUTING and status=STARTED on BATCH_STEP_EXECUTION table.
If I run again the job it will respond "A job execution for this job is already running" and if I issue -restart on this task, it complains with message "No failed or stopped execution found for job".
How does spring batch manages these fatal failure situations? Do I have to remove these execution information manually or is there a reset option?
Thank you for any help
the current release of Spring Batch (2.2.0) doesn't appear to have an out of the box solution for this situation. as discussed in this question, 'manual' intervention in the database may be required. alternatively, if this is a particular job that is hanging (that is, you know the job name), you can do the following as well;
use the JobExplorer.findRunningJobExecutions(jobName)
go through the list of executions and 'fail' them (JobExecution.upgradeStatus(BatchStatus.FAILED))
save the change using JobRepository.update(jobExecution)
Just an FYI why this problem of connection limit occurs when using a CursorItemReader (JDBCCursorItemReader or HibernateCursorItemReader)
The cursorItemReader opens a separate connection even if there is already a connection opened for the transaction (Reader -> Processors -> Writer). So, each step execution needs two connections even if it is in a single transaction and hitting the same db. This causes the connection bottleneck and so the number of db connections should be double the number of threads configured in thread pool to execute the steps in parallel.
This can also be resolve if you provide a separate connection to your CursorReader.
JdbcPagingItemReader is another implementation of ItemReader which uses the same connection opened for the transaction.
I'm using a JDBC driver to run "describe TABLE_NAME" on hive. It gives me the following error:
NativeException: java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED:
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
return code 1 doesn't tell me very much. How do I figure out what the underlying reason is?
It's most likely because your Hive metastore is not setup properly. Hive uses a RDBMS metastore to store meta data about its tables. This includes things like table names, schemas, partitioning/bucketing/sorting columms, table level statistics, etc.
By default, Hive uses an embedded derby metastore which can only be accessed by one process at a time. If you are using that, it's possible that you have multiple sessions to Hive open that's causing this problem.
In any case, I would recommend you to set up a stand alone metastore for Hive. Embedded derby was chosen for its usability in running tests and a good out of the box metastore. However, in my opinion, it's not fit for production workflows. You can find instructions on how to configure MySQL as Hive metastore here.
Possibly you have another sesssion open. Since derby allows only one session per person.
You can check -
ps -wwwfu <your id>
kill the id which is running the hive connection.
It is because the table with the name you've specified is didn't exist in the database.
Try creating the table and again run the command. it will work. :)