private val DATABASE:String = config.getString("db.dbname")
private val SERVER:ServerAddress = {
val hostName=config.getString("db.hostname")
val port=config.getString("db.port").toInt
new ServerAddress(hostName,port)
}
val connectionMongo = MongoConnection(SERVER)
def collectionMongo(name:String) = connectionMongo(DATABASE)(name)
val result:WriteResult = collectionMongo("pgroup")
.insert(new BasicDBObject("_id",privateArtGroup.getUuid)
.append("ArtGroupStatus",privateArtGroup.artGroupStatus.toString())
.append("isNew",privateArtGroup.isNew), WriteConcern.Acknowledged)
log.info("what is the write concern " + collectionMongo(pgroup).getWriteConcern)
log.info("what is the write concern "+collectionMongo(pgroup).getWriteConcern)
I am setting WriteConcern to Acknowledged but its not setting
the log stataments prints this from where i get to know its not setting
What is the write concer WriteConcern{w=0, wTimeout=null ms, fsync=null, journal=null
Why w=0 ? it should be w=1
I am using casbah V 3.1.1
val result:WriteResult = collectionMongo("pgroup")
.insert(new BasicDBObject("_id",privateArtGroup.getUuid)
.append("ArtGroupStatus",privateArtGroup.artGroupStatus.toString())
.append("isNew",privateArtGroup.isNew), WriteConcern.Acknowledged)
WriteConcern.Acknowledged - Write operations that use this write concern will wait for acknowledgement from the primary server before returning.
w: 1 - Requests acknowledgement that the write operation has propagated to the standalone mongod or the primary in a replica set.
Reason for Why w=0 ? i
Once the given insert query is executed with writeconcern acknowledge the job is done. Moreover we are setting the writeconcern for the insert query alone and not for the collection. This could be a reason that you are getting w=0.
But still I couldn't figure out - In general we have w: 1 is the default write concern for MongoDB and why you are getting w=0.
Related
I'd like to join data coming in from two Kafka topics ("left" and "right").
Matching records are to be joined using an ID, but if a "left" or a "right" record is missing, the other one should be passed downstream after a certain timeout. Therefore I have chosen to use the coGroup function.
This works, but there is one problem: If there is no message at all, there is always at least one record which stays in an internal buffer for good. It gets pushed out when new messages arrive. Otherwise it is stuck.
The expected behaviour is that all records should be pushed out after the configured idle timeout has been reached.
Some information which might be relevant
Flink 1.14.4
The Flink parallelism is set to 8, so is the number of partitions in both Kafka topics.
Flink checkpointing is enabled
Event-time processing is to be used
Lombok is used: So val is like final var
Some code snippets:
Relevant join settings
public static final int AUTO_WATERMARK_INTERVAL_MS = 500;
public static final Duration SOURCE_MAX_OUT_OF_ORDERNESS = Duration.ofMillis(4000);
public static final Duration SOURCE_IDLE_TIMEOUT = Duration.ofMillis(1000);
public static final Duration TRANSFORMATION_MAX_OUT_OF_ORDERNESS = Duration.ofMillis(5000);
public static final Duration TRANSFORMATION_IDLE_TIMEOUT = Duration.ofMillis(1000);
public static final Time JOIN_WINDOW_SIZE = Time.milliseconds(1500);
Create KafkaSource
private static KafkaSource<JoinRecord> createKafkaSource(Config config, String topic) {
val properties = KafkaConfigUtils.createConsumerConfig(config);
val deserializationSchema = new KafkaRecordDeserializationSchema<JoinRecord>() {
#Override
public void deserialize(ConsumerRecord<byte[], byte[]> record, Collector<JoinRecord> out) {
val m = JsonUtils.deserialize(record.value(), JoinRecord.class);
val copy = m.toBuilder()
.partition(record.partition())
.build();
out.collect(copy);
}
#Override
public TypeInformation<JoinRecord> getProducedType() {
return TypeInformation.of(JoinRecord.class);
}
};
return KafkaSource.<JoinRecord>builder()
.setProperties(properties)
.setBootstrapServers(config.kafkaBootstrapServers)
.setTopics(topic)
.setGroupId(config.kafkaInputGroupIdPrefix + "-" + String.join("_", topic))
.setDeserializer(deserializationSchema)
.setStartingOffsets(OffsetsInitializer.latest())
.build();
}
Create DataStreamSource
Then the DataStreamSource is built on top of the KafkaSource:
Configure "max out of orderness"
Configure "idleness"
Extract timestamp from record, to be used for event time processing
private static DataStreamSource<JoinRecord> createLeftSource(Config config,
StreamExecutionEnvironment env) {
val leftKafkaSource = createLeftKafkaSource(config);
val leftWms = WatermarkStrategy
.<JoinRecord>forBoundedOutOfOrderness(SOURCE_MAX_OUT_OF_ORDERNESS)
.withIdleness(SOURCE_IDLE_TIMEOUT)
.withTimestampAssigner((joinRecord, __) -> joinRecord.timestamp.toEpochSecond() * 1000L);
return env.fromSource(leftKafkaSource, leftWms, "left-kafka-source");
}
Use keyBy
The keyed sources are created on top of the DataSource instances like this:
Again configure "out of orderness" and "idleness"
Again extract timestamp
val leftWms = WatermarkStrategy
.<JoinRecord>forBoundedOutOfOrderness(TRANSFORMATION_MAX_OUT_OF_ORDERNESS)
.withIdleness(TRANSFORMATION_IDLE_TIMEOUT)
.withTimestampAssigner((joinRecord, __) -> {
if (VERBOSE_JOIN)
log.info("Left : " + joinRecord);
return joinRecord.timestamp.toEpochSecond() * 1000L;
});
val leftKeyedSource = leftSource
.keyBy(jr -> jr.id)
.assignTimestampsAndWatermarks(leftWms)
.name("left-keyed-source");
Join using coGroup
The join then combines the left and the right keyed sources
val joinedStream = leftKeyedSource
.coGroup(rightKeyedSource)
.where(left -> left.id)
.equalTo(right -> right.id)
.window(TumblingEventTimeWindows.of(JOIN_WINDOW_SIZE))
.apply(new CoGroupFunction<JoinRecord, JoinRecord, JoinRecord>() {
#Override
public void coGroup(Iterable<JoinRecord> leftRecords,
Iterable<JoinRecord> rightRecords,
Collector<JoinRecord> out) {
// Transform
val result = ...;
out.collect(result);
}
Write stream to console
The resulting joinedStream is written to the console:
val consoleSink = new PrintSinkFunction<JoinRecord>();
joinedStream.addSink(consoleSink);
How can I configure this join operation, so that all records are pushed downstream after the configured idle timeout?
If it can't be done this way: Is there another option?
This is the expected behavior. withIdleness doesn't try to handle the case where all streams are idle. It only helps in cases where there are still events flowing from at least one source partition/shard/split.
To get the behavior you desire (in the context of a continuous streaming job), you'll have to implement a custom watermark strategy that advances the watermark based on a processing time timer. Here's an implementation that uses the legacy watermark API.
On the other hand, if the job is complete and you just want to drain the final results before shutting it down, you can use the --drain option when you stop the job. Or if you use bounded sources this will happen automatically.
I using the following method in order to truncate data from aerospike namespace.set.bins:
// Setting LUT
val calendar = Calendar.getInstance()
calendar.setTimeInMillis(startTime + 1262304000000L) // uses CITRUSLEAF_EPOCH - see https://discuss.aerospike.com/t/how-to-use-view-and-calulate-last-update-time-lut-for-the-truncate-command/4330
logger.info(s"truncate($startTime = ${calendar.getTime}, durableDelete = $durableDelete) on ${config.toRecoverMap}")
// Define Scan and Write Policies
val writePolicy = new WritePolicy()
val scanPolicy = new ScanPolicy()
writePolicy.durableDelete = durableDelete
scanPolicy.filterExp = Exp.build(Exp.le(Exp.lastUpdate(), Exp.`val`(calendar)))
// Scan all records such as LUT <= startTime
config.toRecoverMap.flatMap { case (namespace, mapOfSetsToBins) =>
for ((set, bins) <- mapOfSetsToBins) yield {
val recordCount = new AtomicInteger(0)
client.scanAll(scanPolicy, namespace, set, new ScanCallback() {
override def scanCallback(key: Key, record: Record): Unit = {
val requiresNullify = bins.filter(record.bins.containsKey(_)).distinct // Instead of making bulk requests which maybe not be needed and load AS
if (requiresNullify.nonEmpty) {
client.put(writePolicy, key, requiresNullify.map(Bin.asNull): _*)
logger.debug(s"${recordCount.incrementAndGet()}: (${requiresNullify.mkString(",")}) Bins of Record: $record with $key are set to NULL")
}
}
})
logger.info(s"Totally $recordCount records affected during the truncate operation on $namespace.$set.$bins")
recordCount.get
}
}
}
This is failed on:
...
2021-08-08 16:51:30,551 [Aerospike-6] DEBUG c.d.a.c.r.services.AerospikeService.scanCallback(55) - 33950: (IsActive) Bins of Record: (gen:3),(exp:0),(bins:(IsActive:0)) with test-recovery-set-multi-1:null:95001b26e70dbb35e1487802ebbc857eceb92246 are set to NULL
for reason:
Error -11,6,0,30000,0,5: Max retries exceeded: 5
com.aerospike.client.AerospikeException: Error -11,6,0,30000,0,5: Max retries exceeded: 5
at com.aerospike.client.query.PartitionTracker.isComplete(PartitionTracker.java:282)
at com.aerospike.client.command.ScanExecutor.scanPartitions(ScanExecutor.java:70)
at com.aerospike.client.AerospikeClient.scanAll(AerospikeClient.java:1519)
at com.aerospike.connect.reloader.services.AerospikeService.$anonfun$truncate$3(AerospikeService.scala:50)
at com.aerospike.connect.reloader.services.AerospikeService.$anonfun$truncate$3$adapted(AerospikeService.scala:48)
at scala.collection.Iterator$$anon$9.next(Iterator.scala:575)
at scala.collection.immutable.List.prependedAll(List.scala:153)
at scala.collection.immutable.List$.from(List.scala:651)
at scala.collection.immutable.List$.from(List.scala:648)
at scala.collection.IterableFactory$Delegate.from(Factory.scala:288)
at scala.collection.immutable.Iterable$.from(Iterable.scala:35)
at scala.collection.immutable.Iterable$.from(Iterable.scala:32)
at scala.collection.IterableOps$WithFilter.map(Iterable.scala:884)
at com.aerospike.connect.reloader.services.AerospikeService.$anonfun$truncate$1(AerospikeService.scala:48)
at scala.collection.StrictOptimizedIterableOps.flatMap(StrictOptimizedIterableOps.scala:117)
at scala.collection.StrictOptimizedIterableOps.flatMap$(StrictOptimizedIterableOps.scala:104)
at scala.collection.immutable.Map$Map1.flatMap(Map.scala:241)
at com.aerospike.connect.reloader.services.AerospikeService.truncate(AerospikeService.scala:47)
at com.aerospike.connect.reloader.tests.services.AerospikeServiceSpec.$anonfun$new$2(AerospikeServiceSpec.scala:23)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.wordspec.AnyWordSpecLike$$anon$3.apply(AnyWordSpecLike.scala:1077)
at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
at com.aerospike.connect.reloader.tests.services.AerospikeServiceSpec.withFixture(AerospikeServiceSpec.scala:13)
at org.scalatest.wordspec.AnyWordSpecLike.invokeWithFixture$1(AnyWordSpecLike.scala:1075)
at org.scalatest.wordspec.AnyWordSpecLike.$anonfun$runTest$1(AnyWordSpecLike.scala:1087)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.wordspec.AnyWordSpecLike.runTest(AnyWordSpecLike.scala:1087)
at org.scalatest.wordspec.AnyWordSpecLike.runTest$(AnyWordSpecLike.scala:1069)
at com.aerospike.connect.reloader.tests.services.AerospikeServiceSpec.runTest(AerospikeServiceSpec.scala:13)
at org.scalatest.wordspec.AnyWordSpecLike.$anonfun$runTests$1(AnyWordSpecLike.scala:1146)
at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
at scala.collection.immutable.List.foreach(List.scala:333)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:390)
at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:427)
at scala.collection.immutable.List.foreach(List.scala:333)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
at org.scalatest.wordspec.AnyWordSpecLike.runTests(AnyWordSpecLike.scala:1146)
at org.scalatest.wordspec.AnyWordSpecLike.runTests$(AnyWordSpecLike.scala:1145)
at com.aerospike.connect.reloader.tests.services.AerospikeServiceSpec.runTests(AerospikeServiceSpec.scala:13)
at org.scalatest.Suite.run(Suite.scala:1112)
at org.scalatest.Suite.run$(Suite.scala:1094)
at com.aerospike.connect.reloader.tests.services.AerospikeServiceSpec.org$scalatest$BeforeAndAfterAll$$super$run(AerospikeServiceSpec.scala:13)
at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at com.aerospike.connect.reloader.tests.services.AerospikeServiceSpec.org$scalatest$wordspec$AnyWordSpecLike$$super$run(AerospikeServiceSpec.scala:13)
at org.scalatest.wordspec.AnyWordSpecLike.$anonfun$run$1(AnyWordSpecLike.scala:1191)
at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
at org.scalatest.wordspec.AnyWordSpecLike.run(AnyWordSpecLike.scala:1191)
at org.scalatest.wordspec.AnyWordSpecLike.run$(AnyWordSpecLike.scala:1189)
at com.aerospike.connect.reloader.tests.services.AerospikeServiceSpec.run(AerospikeServiceSpec.scala:13)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13(Runner.scala:1320)
at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13$adapted(Runner.scala:1314)
at scala.collection.immutable.List.foreach(List.scala:333)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1314)
at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:993)
at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:971)
at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1480)
at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:971)
at org.scalatest.tools.Runner$.run(Runner.scala:798)
at org.scalatest.tools.Runner.run(Runner.scala)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2or3(ScalaTestRunner.java:38)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:25)
Any ideas why its happening?
LUT method:
def calculateCurrentLUT(): Long = {
logger.info("calculateCurrentLUTs() Triggered")
val policy = new WritePolicy()
policy.setTimeout(config.operationTimeoutInMillis)
val key = new Key(config.toRecover.head.namespace, AerospikeConfiguration.dummySetName, AerospikeConfiguration.dummyKey)
client.put(policy, key, new Bin(AerospikeConfiguration.dummyBin, "Used by the Recovery process to calculate current machine startTime"))
client.execute(policy, key, AerospikeConfiguration.packageName, "getLUT").asInstanceOf[Long]
}
with:
def registerUDFs(): RegisterTask = {
logger.info(s"registerUDFs() Triggered")
val policy = new WritePolicy()
policy.setTimeout(config.operationTimeoutInMillis)
client.registerUdfString(policy, """
|function getLUT(r)
| return record.last_update_time(r)
|end
|""", AerospikeConfiguration.packageName + ".lua", Language.LUA)
}
AerospikeException: Error -11,6,0,30000,0,5: Max retries exceeded: 5 means -11: error code, maximum retry attempts on this operation exceeded specified value. Shows 6 iterations (orig+maxretries) and you specified max retries at 5. Your connection settings are: 0 - for connectTimeout - wait to create initial socket, 0 is default, 30000 or 30s is your time to close an idle socket, 0 is the total timeout for this scan is operation - 0 means don't timeout which is correct for scans, 5 is the times you retried - looks like server is not responding back to client scan call in 30seconds and client closes the idle socket and retries and after 5 re-tries throws an Exception. Something is obviously wrong - check server log for more clues. For e.g. Are you using the correct server version that supports Expressions for scans? Second, I would check your computation of LUT comparison expression. if the filter expression is evaluating to false, scan will just return an EOF on completion, no matching records -but if socket times out before that, scan will go into a retry.
For my current project i'm using Cassandra Db for fetching data frequently. Within every second at least 30 Db requests will hit. For each request at least 40000 rows needed to fetch from Db. Following is my current code and this method will return Hash Map.
public Map<String,String> loadObject(ArrayList<Integer> tradigAccountList){
com.datastax.driver.core.Session session;
Map<String,String> orderListMap = new HashMap<>();
List<ResultSetFuture> futures = new ArrayList<>();
List<ListenableFuture<ResultSet>> Future;
try {
session =jdbcUtils.getCassandraSession();
PreparedStatement statement = jdbcUtils.getCassandraPS(CassandraPS.LOAD_ORDER_LIST);
for (Integer tradingAccount:tradigAccountList){
futures.add(session.executeAsync(statement.bind(tradingAccount).setFetchSize(3000)));
}
Future = Futures.inCompletionOrder(futures);
for (ListenableFuture<ResultSet> future : Future){
for (Row row: future.get()){
orderListMap.put(row.getString("cliordid"), row.getString("ordermsg"));
}
}
}catch (Exception e){
}finally {
}
return orderListMap;
}
My data request query is something like this,
"SELECT cliordid,ordermsg FROM omsks_v1.ordersStringV1 WHERE tradacntid = ?".
My Cassandra cluster has 2 nodes with 32 concurrent read and write thread for each and my Db schema as follow
CREATE TABLE omsks_v1.ordersstringv1_copy1 (
tradacntid int,
cliordid text,
ordermsg text,
PRIMARY KEY (tradacntid, cliordid)
) WITH bloom_filter_fp_chance = 0.01
AND comment = ''
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE'
AND caching = {
'keys' : 'ALL',
'rows_per_partition' : 'NONE'
}
AND compression = {
'sstable_compression' : 'LZ4Compressor'
}
AND compaction = {
'class' : 'SizeTieredCompactionStrategy'
};
My problem is getting Cassandra timeout exception, how to optimize my code to handle all these requests
It would be better if you would attach the snnipet of that Exception (Read/write exception). I assume you are getting read time out. You are trying to fetch a large data set on a single request.
For each request at least 40000 rows needed to fetch from Db
If you have a large record and resultset is too big, it throws exception if results cannot be returned within a time limit mentioned in Cassandra.yaml.
read_request_timeout_in_ms
You can increase the timeout but this is not a good option. It may resolve the issue (may not throw exception but will take more time to return result).
Solution: For big data set you can get the result using manual pagination (range query) with limit.
SELECT cliordid,ordermsg FROM omsks_v1.ordersStringV1
WHERE tradacntid > = ? and cliordid > ? limit ?;
Or use range query
SELECT cliordid,ordermsg FROM omsks_v1.ordersStringV1 WHERE tradacntid
= ? and cliordid >= ? and cliordid <= ?;
This will be much more faster than fetching the whole resultset.
You can also try by reducing the fetch size. Although it will return the whole resultset.
public Statement setFetchSize(int fetchSize) to check if exception is thrown.
setFetchSize controls the page size, but it doesn't control the
maximum rows returned in a ResultSet.
Another point to be noted:
What's the size of tradigAccountList?
Too many requests at a time also may lead to timeout. Large size of tradigAccountList and a lot of read requests are done at a time (load balancing of requests are handled by Cassandra and how many requests can be handled depends on cluster size and some other factors) may cause this exception .
Some related Links:
Cassandra read timeout
NoHostAvailableException With Cassandra & DataStax Java Driver If Large ResultSet
Cassandra .setFetchSize() on statement is not honoured
I'm getting this error when reading from a table in a 5 node cluster using datastax drivers.
2015-02-19 03:24:09,908 ERROR [akka.actor.default-dispatcher-9] OneForOneStrategy akka://user/HealthServiceChecker-49e686b9-e189-48e3-9aeb-a574c875a8ab Can't use this Cluster instance because it was previously closed
java.lang.IllegalStateException: Can't use this Cluster instance because it was previously closed
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1128) ~[cassandra-driver-core-2.0.4.jar:na]
at com.datastax.driver.core.Cluster.init(Cluster.java:149) ~[cassandra-driver-core-2.0.4.jar:na]
at com.datastax.driver.core.Cluster.connect(Cluster.java:225) ~[cassandra-driver-core-2.0.4.jar:na]
at com.datastax.driver.core.Cluster.connect(Cluster.java:258) ~[cassandra-driver-core-2.0.4.jar:na]
I am able to connect using cqlsh and perform read operations.
Any clue what could be the problem here?
settings:
Consistency Level: ONE
keyspace replication strategy:
'class': 'NetworkTopologyStrategy',
'DC2': '1',
'DC1': '1'
cassandra version: 2.0.6
The code managing cassandra sessions is central and it is;
trait ConfigCassandraCluster
extends CassandraCluster
{
def cassandraConf: CassandraConfig
lazy val port = cassandraConf.port
lazy val host = cassandraConf.host
lazy val cluster: Cluster =
Cluster.builder()
.addContactPoints(host)
.withReconnectionPolicy(new ExponentialReconnectionPolicy(100, 30000))
.withPort(port)
.withSocketOptions(new SocketOptions().setKeepAlive(true))
.build()
lazy val keyspace = cassandraConf.keyspace
private lazy val casSession = cluster.connect(keyspace)
val session = new SessionProvider(casSession)
}
class SessionProvider(casSession: => Session) extends Logging {
var lastSuccessful: Long = 0
var firstSuccessful: Long = -1
def apply[T](fn: Session => T): T = {
val result = retry(fn, 15)
if(firstSuccessful < 0)
firstSuccessful = System.currentTimeMillis()
lastSuccessful = System.currentTimeMillis()
result
}
private def retry[T](fn: Session => T, remainingAttempts: Int): T = {
//retry logic
}
The problem is, cluster.connect(keyspace) will close the cluster itself if it experiences NoHostAvailableException. Due to that during retry logic, you are experiencing IllegalStateException.
Have a look at Cluster init() method and you will understand more.
The solution for your problem would be, in the retry logic, do Cluster.builder.addContactPoint(node).build.connect(keyspace). This will enable to have a new cluster object while you retry.
Search your code for session.close().
You are closing your connection somewhere as stated in the comments. Once a session is closed, it can't be used again. Instead of closing connections, pool them to allow for re-use.
I run the following code on my local (mac) machine and on a remote unix server.:
public void deleteValue(final String id, final String value) {
log.info("Removing value " + value);
final Collection<String> valuesBeforeRemoval = getValues(id);
final MutationBatch m = keyspace.prepareMutationBatch();
m.withRow(VALUES_CF, id).deleteColumn(value);
try {
m.execute();
} catch (final ConnectionException e) {
log.error("Unable to delete location " + value, e);
}
final Collection<String> valuesAfterRemoval = getValues(id);
if (valuesAfterRemoval.size()!=(valuesBeforeRemoval.size()-1)) {
log.error("value " + value + " was supposed to be removed from list " + valuesBeforeRemoval + " but it wasn't: " + valuesAfterRemoval);
}
...
}
protected Collection<String> getValues(final String id) {
try {
final OperationResult<ColumnList<String>> operationResult = keyspace
.prepareQuery(VALUES_CF).getKey(id).execute();
final ColumnList<String> result = operationResult.getResult();
if (result.isEmpty()) {
log.info("No value found for id: " + id);
return new ArrayList<String>();
}
return result.getColumnNames();
} catch (final ConnectionException e) {
log.error("Unable to retrieve session " + id, e);
}
return new ArrayList<String>();
}
Locally, that line is never executed, which makes sense:
log.error("value " + value + " was supposed to be removed from list " + valuesBeforeRemoval + " but it wasn't: " + valuesAfterRemoval);
but that line is executed on my dev server:
[ERROR] [main] [n.o.w.s.d.SessionDaoCassandraImpl] [2013-03-08 13:12:24,801]
[] - value 3 was supposed to be removed from list [3, 2, 1, 0, 7, 6, 5, 4, 9, 8] but it wasn't: [3, 2, 1, 0, 7, 6, 5, 4, 9, 8]
I am using com.netflix.astyanax
Both my local machine and the remote dev server connect to the very
same cassandra instance.
Both my local machine and the remote dev server run the very same test
creating a new row family, and adding 10 records before one is deleted.
When the error occurs on dev, log.error("Unable to delete
location " + value, e); was not executed (i.e. running the deletion
command didn't produce any exception).
I am 100% positive that no other code is affecting the content of the
database while I am running the test on dev so this isn't some
strange concurrency issue.
What could possibly explain that the deleteColumn(value) request runs without producing any error but still does not remove the column from the database?
ADDITIONAL INFO
Here is how I created the keyspace:
create keyspace sessiondata
with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
and strategy_options = {replication_factor:1};
Here is how I created the column family values, referenced as VALUES_CF in the code above:
create column family values
with comparator = UTF8Type
;
Here is how the keyspace referenced in the java code above is defined:
final AstyanaxContext.Builder contextBuilder = getBuilder();
final AstyanaxContext<Keyspace> keyspaceContext = contextBuilder
.forKeyspace(keyspaceName).buildKeyspace(
ThriftFamilyFactory.getInstance());
keyspaceContext.start();
keyspace = keyspaceContext.getEntity();
where getBuilder is:
private Builder getBuilder() {
final AstyanaxConfigurationImpl conf = new AstyanaxConfigurationImpl()
.setDiscoveryType(NodeDiscoveryType.NONE)
.setRetryPolicy(new RunOnce());
final ConnectionPoolConfigurationImpl poolConf = new ConnectionPoolConfigurationImpl("MyPool")
.setPort(port)
.setMaxConnsPerHost(1)
.setSeeds(value);
return new AstyanaxContext.Builder()
.forCluster(cluster)
.withAstyanaxConfiguration(conf)
.withConnectionPoolConfiguration(poolConf)
.withConnectionPoolMonitor(new CountingConnectionPoolMonitor());
}
SECOND UPDATE
First, the issues are not solely related to deletes. I observe similar problems when updating records in the database, reading them, and not being able to read the updates I just wrote
Second, I created a test that does 100 times the following operations:
write a row into cassandra
update that row in cassandra
read back that row from cassandra and check whether the row was indeed updated, and checking again regularly after delays if it wasn't
What I observe from that test is that:
again, when I run that code locally, all 100 iterations pass right away (no retry ever needed)
when I run that code on the remote server, some of the iterations pass, some fail. When they fail, no matter how large the delay (I wait up to 10 seconds), the test always fail.
At this point, I am really not sure how any cassandra setup could explain this behavior since I connect to the very same server for my tests and since the delays I insert are much larger than any additional latency I may need to run the test when connecting from my local machine.
The only relevant difference seems to be which machine the code is running on.
THIRD UPDATE
If in the test mentioned in the previous update, I insert a delay between the 2 writes, the code starts passing if the delay is >= 1,000 ms. A delay of, say, 100 ms doesn't help. I also modified the builder to set the default read and write consistencies to the most demanding: ALL, and that had no impact on the results of the test (still failing about half of the time unless delay between writes >1s):
final AstyanaxConfigurationImpl conf = new AstyanaxConfigurationImpl()
.setDiscoveryType(NodeDiscoveryType.NONE)
.setRetryPolicy(new RunOnce()).setDefaultReadConsistencyLevel(ConsistencyLevel.CL_ALL).setDefaultWriteConsistencyLevel(ConsistencyLevel.CL_ALL);
To debug, try printing the full row instead of just the column names. When I say the full row I mean the column name, column value and the time stamp. A long shot is clocks are wrong on one of your test machines and this is throwing out your tests on the other.
Another thing to double check is that ip is indeed what you think it is, in both your application and cassandra. When you retrieve it print it between something, like println("-" + ip "-"). Before and after your try block for the execute in deleteSecureLocation do a get for only that column, not the entire row. I'm not too sure how to do that in astynax, on the cli it would be get[id][ip].
Something to keep in mind is that a delete won't fail even if there's nothing to delete. To cassandra it's a write, the only thing that will make it a delete is if on read it's the latest timestamped entry against that row/column name.