DSE Cassandra - Why is Astyanax faster than DataStax java driver

DSE Cassandra - Why is Astyanax faster than DataStax java driver - java

I'm switching a Java application from using com.netflix.astyanax:astyanax-core:1.56.44 to com.datastax.cassandra:cassandra-driver-core:3.1.0.
Right off the bat, with a simple test that inserts a row with a randomly-generated key and then reads the row 1000 times, I'm seeing terrible performance compared to the code using Astyanax. Just using a single-node Cassandra instance running locally. The table I'm testing is simple -- just a blob primary key uuid column, and an int date column.
Here's the basic code with the DataStax driver:
class DataStaxCassandra
{
final Session session;
final PreparedStatement preparedIDWriteCmd;
final PreparedStatement preparedIDReadCmd;
void DataStaxCassandra()
{
final PoolingOptions poolingOptions = new PoolingOptions()
.setConnectionsPerHost(HostDistance.LOCAL, 1, 2)
.setConnectionsPerHost(HostDistance.REMOTE, 1, 1)
.setMaxRequestsPerConnection(HostDistance.LOCAL, 128)
.setMaxRequestsPerConnection(HostDistance.REMOTE, 128)
.setPoolTimeoutMillis(0); // Don't ever wait for a connection to one host.
final QueryOptions queryOptions = new QueryOptions()
.setConsistencyLevel(ConsistencyLevel.LOCAL_ONE)
.setPrepareOnAllHosts(true)
.setReprepareOnUp(true);
final LoadBalancingPolicy dcAwareRRPolicy = DCAwareRoundRobinPolicy.builder()
.withLocalDc("my_laptop")
.withUsedHostsPerRemoteDc(0)
.build();
final LoadBalancingPolicy loadBalancingPolicy = new TokenAwarePolicy(dcAwareRRPolicy);
final SocketOptions socketOptions = new SocketOptions()
.setConnectTimeoutMillis(1000)
.setReadTimeoutMillis(1000);
final RetryPolicy retryPolicy = new LoggingRetryPolicy(DefaultRetryPolicy.INSTANCE);
Cluster.Builder clusterBuilder = Cluster.builder()
.withClusterName("test cluster")
.withPort(9042)
.addContactPoints("127.0.0.1")
.withPoolingOptions(poolingOptions)
.withQueryOptions(queryOptions)
.withLoadBalancingPolicy(loadBalancingPolicy)
.withSocketOptions(socketOptions)
.withRetryPolicy(retryPolicy);
// I've tried both V3 and V2, with lower connections/host and higher reqs/connection settings
// with V3, and it doesn't noticably affect the test performance. Leaving it at V2 because the
// Astyanax version is using V2.
clusterBuilder.withProtocolVersion(ProtocolVersion.V2);
final Cluster cluster = clusterBuilder.build();
session = cluster.connect();
preparedIDWriteCmd = session.prepare(
"INSERT INTO \"mykeyspace\".\"mytable\" (\"uuid\", \"date\") VALUES (?, ?) USING TTL 38880000");
preparedIDReadCmd = session.prepare(
"SELECT \"date\" from \"mykeyspace\".\"mytable\" WHERE \"uuid\"=?");
}
public List<Row> execute(final Statement statement, final int timeout)
throws InterruptedException, ExecutionException, TimeoutException
{
final ResultSetFuture future = session.executeAsync(statement);
try
{
final ResultSet readRows = future.get(timeout, TimeUnit.MILLISECONDS);
final List<Row> resultRows = new ArrayList<>();
// How far we can go without triggering the blocking fetch:
int remainingInPage = readRows.getAvailableWithoutFetching();
for (final Row row : readRows)
{
resultRows.add(row);
if (--remainingInPage == 0) break;
}
return resultRows;
}
catch (final TimeoutException e)
{
future.cancel(true);
throw e;
}
}
private void insertRow(final byte[] id, final int date)
throws InterruptedException, ExecutionException, TimeoutException
{
final ByteBuffer idKey = ByteBuffer.wrap(id);
final BoundStatement writeCmd = preparedIDWriteCmd.bind(idKey, date);
writeCmd.setRoutingKey(idKey);
execute(writeCmd, 1000);
}
public int readRow(final byte[] id)
throws InterruptedException, ExecutionException, TimeoutException
{
final ByteBuffer idKey = ByteBuffer.wrap(id);
final BoundStatement readCmd = preparedIDReadCmd.bind(idKey);
readCmd.setRoutingKey(idKey);
final List<Row> idRows = execute(readCmd, 1000);
if (idRows.isEmpty()) return 0;
final Row idRow = idRows.get(0);
return idRow.getInt("date");
}
}
void perfTest()
{
final DataStaxCassandra ds = new DataStaxCassandra();
final int perfTestCount = 10000;
final long startTime = System.nanoTime();
for (int i = 0; i < perfTestCount; ++i)
{
final String id = UUIDUtils.generateRandomUUIDString();
final byte[] idBytes = Utils.hexStringToByteArray(id);
final int date = (int)(System.currentTimeMillis() / 1000);
try
{
ds.insertRow(idBytes, date);
final int dateRead = ds.readRow(idBytes);
assert(dateRead == date) : "Inserted ID with date " +date +" but date read is " +dateRead;
}
catch (final InterruptedException | ExecutionException | TimeoutException e)
{
System.err.println("ERROR reading ID (test " +(i+1) +") - " +e.toString());
}
}
System.out.println(
perfTestCount +" insert+reads took " +
TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startTime) +" ms");
}
Is there something I'm doing wrong that would yield poor performance? I was hoping it'd be a decent speed improvement, given that I'm using a pretty old version of Astyanax.
I've tried not wrapping the load balancing policy with TokenAwarePolicy, and getting rid of the "setRoutingKey" lines, just because I know these things definitely shouldn't help when just using a single node as I'm currently doing.
My local Cassandra version is 2.1.15 (which supports native protocol V3), but the machines in our production environment are running Cassandra 2.0.12.156 (which only supports V2).
Keep in mind that this is targeted for an environment with a bunch of nodes and several data centers, which is why I've got the settings the way I do (which the actual values being set from a config file), even though I know for this test I could skip using things like DCAwareRoundRobinPolicy.
Any help would be greatly appreciated! I can also post the code that's using Astyanax, I just figured first it'd be good to make sure nothing's blatantly wrong with my new code.
Thanks!
Tests of 10,000 writes+reads are taking around 30 seconds with the DataStax driver, while with Astyanax, they are taking in the 15-20 second range.
I upped the test count to 100,000 to see if maybe there's some overhead with the DataStax driver that just consumes ~10 seconds at startup, after which they might perform more similarly. But even with 100,000 read/writes:
AstyanaxCassandra 100,000 insert+reads took 156593 ms
DataStaxCassandra 100,000 insert+reads took 294340 ms

Related

How to get optimal bulk insertion rate in DynamoDb through Executor Framework in Java?

I'm doing a POC on Bulk write (around 5.5k items) in local Dynamo DB using DynamoDB SDK for Java. I'm aware that each bulk write cannot have more than 25 write operations, so I am dividing the whole dataset into chunks of 25 items each. Then I'm passing these chunks as callable actions in Executor framework. Still, I'm not having a satisfactory result as the 5.5k records are getting inserted in more than 100 seconds.
I'm not sure how else can I optimize this. While creating the table I provisioned the WriteCapacityUnit as 400(not sure what's the maximum value I can give) and experimented with it a bit, but it never made any difference. I have also tried changing the number of threads in executor.
This is the main code to perform the bulk write operation:
public static void main(String[] args) throws Exception {
AmazonDynamoDBClient client = new AmazonDynamoDBClient().withEndpoint("http://localhost:8000");
final AmazonDynamoDB aws = new AmazonDynamoDBClient(new BasicAWSCredentials("x", "y"));
aws.setEndpoint("http://localhost:8000");
JSONArray employees = readFromFile();
Iterator<JSONObject> iterator = employees.iterator();
List<WriteRequest> batchList = new ArrayList<WriteRequest>();
ExecutorService service = Executors.newFixedThreadPool(20);
List<BatchWriteItemRequest> listOfBatchItemsRequest = new ArrayList<>();
while(iterator.hasNext()) {
if (batchList.size() == 25) {
Map<String, List<WriteRequest>> batchTableRequests = new HashMap<String, List<WriteRequest>>();
batchTableRequests.put("Employee", batchList);
BatchWriteItemRequest batchWriteItemRequest = new BatchWriteItemRequest();
batchWriteItemRequest.setRequestItems(batchTableRequests);
listOfBatchItemsRequest.add(batchWriteItemRequest);
batchList = new ArrayList<WriteRequest>();
}
PutRequest putRequest = new PutRequest();
putRequest.setItem(ItemUtils.fromSimpleMap((Map) iterator.next()));
WriteRequest writeRequest = new WriteRequest();
writeRequest.setPutRequest(putRequest);
batchList.add(writeRequest);
}
StopWatch watch = new StopWatch();
watch.start();
List<Future<BatchWriteItemResult>> futureListOfResults = listOfBatchItemsRequest.stream().
map(batchItemsRequest -> service.submit(() -> aws.batchWriteItem(batchItemsRequest))).collect(Collectors.toList());
service.shutdown();
while(!service.isTerminated());
watch.stop();
System.out.println("Total time taken : " + watch.getTotalTimeSeconds());
}
}
This is the code used to create the dynamoDB table:
public static void main(String[] args) throws Exception {
AmazonDynamoDBClient client = new AmazonDynamoDBClient().withEndpoint("http://localhost:8000");
DynamoDB dynamoDB = new DynamoDB(client);
String tableName = "Employee";
try {
System.out.println("Creating the table, wait...");
Table table = dynamoDB.createTable(tableName, Arrays.asList(new KeySchemaElement("ID", KeyType.HASH)
), Arrays.asList(new AttributeDefinition("ID", ScalarAttributeType.S)),
new ProvisionedThroughput(1000L, 1000L));
table.waitForActive();
System.out.println("Table created successfully. Status: " + table.getDescription().getTableStatus());
} catch (Exception e) {
System.err.println("Cannot create the table: ");
System.err.println(e.getMessage());
}
}

DynamoDB Local is provided as a tool for developers who need to develop offline for DynamoDB and is not designed for scale or performance. As such it is not intended for scale testing, and if you need to test bulk loads or other high velocity workloads it is best to use a real table. The actual cost incurred from dev testing on a live table is usually quite minimal as the tables only need to be provisioned for high capacity during the test runs.

How does apache spark works?

I am testing this framework specifically the SQL module with the java API. Running it as standalone in java, I have better results (even running in the same machine) then pure sql (running over pgadmin).
Executing exactly the same query in the same bd (postgres 9.4 and 9.5), sparks is almost 5 x faster, below is my code:
private static final JavaSparkContext sc =
new JavaSparkContext(new SparkConf()
.setAppName("SparkJdbc").setMaster("local[*]"));
public static void main(String[] args) {
StringBuilder sql = new StringBuilder();
sql.append(" my query ");
long time = System.currentTimeMillis();
DbConnection dbConnection = new DbConnection("org.postgresql.Driver", "jdbc:postgresql://localhost:5432/mydb", "user", "pass");
JdbcRDD<Object[]> jdbcRDD = new JdbcRDD<>(sc.sc(), dbConnection, sql.toString(),
1, 1000000000, 20, new MapResult(), ClassManifestFactory$.MODULE$.fromClass(Object[].class));
JavaRDD<Object[]> javaRDD = JavaRDD.fromRDD(jdbcRDD, ClassManifestFactory$.MODULE$.fromClass(Object[].class));
// javaRDD.map((final Object[] record) -> {
// StringBuilder line = new StringBuilder();
// for (Object o : record) {
// line.append(o != null ? o.toString() : "").append(",");
// }
// return line.toString();
// }).collect().forEach(rec -> System.out.println(rec));
System.out.println("Total: " + javaRDD.count());
System.out.println("Total time: " + (System.currentTimeMillis() - time));
}
Even if I uncomment the code to print the results to the console, it still running faster.
I am wondering how can it be faster, if the source (bd) is the same, can anyone explain it? Maybe using parallel query?
EDIT: There is a relevant information about the postgres process, in standby, the postgres uses 12 process and running the query over pgadmin, the number does not change, over spark, it up to 20, that means probably spark uses some parallel mechanism to do the job.

Apache Curator - Zookeeper connection loss exception, possible memory leak

I have been working on a process that continuously monitors a distributed atomic long counter. It monitors it every minute using the following class ZkClient's method getCounter. In fact, I have multiple threads running each of which are monitoring a different counter (distributed atomic long) stored in the Zookeeper nodes. Each thread specifies the path of the counter via the parameters of the getCounter method.
public class TagserterZookeeperManager {
public enum ZkClient {
COUNTER("10.11.18.25:2181"); // Integration URL
private CuratorFramework client;
private ZkClient(String servers) {
Properties props = TagserterConfigs.ZOOKEEPER.getProperties();
String zkFromConfig = props.getProperty("servers", "");
if (zkFromConfig != null && !zkFromConfig.isEmpty()) {
servers = zkFromConfig.trim();
}
ExponentialBackoffRetry exponentialBackoffRetry = new ExponentialBackoffRetry(1000, 3);
client = CuratorFrameworkFactory.newClient(servers, exponentialBackoffRetry);
client.start();
}
public CuratorFramework getClient() {
return client;
}
}
public static String buildPath(String ... node) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < node.length; i++) {
if (node[i] != null && !node[i].isEmpty()) {
sb.append("/");
sb.append(node[i]);
}
}
return sb.toString();
}
public static DistributedAtomicLong getCounter(String taskType, int hid, String jobId, String countType) {
String path = buildPath(taskType, hid+"", jobId, countType);
Builder builder = PromotedToLock.builder().lockPath(path + "/lock").retryPolicy(new ExponentialBackoffRetry(10, 10));
DistributedAtomicLong count = new DistributedAtomicLong(ZkClient.COUNTER.getClient(), path, new RetryNTimes(5, 20), builder.build());
return count;
}
}
From within the threads, this is how I am calling this method:
DistributedAtomicLong counterTotal = TagserterZookeeperManager
.getCounter("testTopic", hid, jobId, "test");
Now it seems like after the threads have run for a few hours, at one stage I start getting the following org.apache.zookeeper.KeeperException$ConnectionLossException exception inside the getCounter method where it tries to read the count:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /contentTaskProd
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073)
at org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:215)
at org.apache.curator.utils.EnsurePath$InitialHelper$1.call(EnsurePath.java:148)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
at org.apache.curator.utils.EnsurePath$InitialHelper.ensure(EnsurePath.java:141)
at org.apache.curator.utils.EnsurePath.ensure(EnsurePath.java:99)
at org.apache.curator.framework.recipes.atomic.DistributedAtomicValue.getCurrentValue(DistributedAtomicValue.java:254)
at org.apache.curator.framework.recipes.atomic.DistributedAtomicValue.get(DistributedAtomicValue.java:91)
at org.apache.curator.framework.recipes.atomic.DistributedAtomicLong.get(DistributedAtomicLong.java:72)
...
I keep getting this exception from thereon for a while and I get the feeling it is causing some internal memory leaks that eventually causes an OutOfMemory error and the whole process bails out. Does anybody have any idea what the reason for this could be? Why would Zookeeper suddenly start throwing the connection loss exception? After the process bails out, I can manually connect to Zookeeper through another small console program that I have written (also using curator) and all look good there.

In order to monitor a node in Zookeeper using curator you can use the NodeCache this won't solve your connection problems.... but instead of polling the node once a minute you can get a push event when it changes.
In my experience, the NodeCache handles quite well disconnection and resume of connections.

What is the fastest way to bulk load data into HBase programmatically?

I have a Plain text file with possibly millions of lines which needs custom parsing and I want to load it into an HBase table as fast as possible (using Hadoop or HBase Java client).
My current solution is based on a MapReduce job without the Reduce part. I use FileInputFormat to read the text file so that each line is passed to the map method of my Mapper class. At this point the line is parsed to form a Put object which is written to the context. Then, TableOutputFormat takes the Put object and inserts it to table.
This solution yields an average insertion rate of 1,000 rows per second, which is less than what I expected. My HBase setup is in pseudo distributed mode on a single server.
One interesting thing is that during insertion of 1,000,000 rows, 25 Mappers (tasks) are spawned but they run serially (one after another); is this normal?
Here is the code for my current solution:
public static class CustomMap extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
protected void map(LongWritable key, Text value, Context context) throws IOException {
Map<String, String> parsedLine = parseLine(value.toString());
Put row = new Put(Bytes.toBytes(parsedLine.get(keys[1])));
for (String currentKey : parsedLine.keySet()) {
row.add(Bytes.toBytes(currentKey),Bytes.toBytes(currentKey),Bytes.toBytes(parsedLine.get(currentKey)));
}
try {
context.write(new ImmutableBytesWritable(Bytes.toBytes(parsedLine.get(keys[1]))), row);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
public int run(String[] args) throws Exception {
if (args.length != 2) {
return -1;
}
conf.set("hbase.mapred.outputtable", args[1]);
// I got these conf parameters from a presentation about Bulk Load
conf.set("hbase.hstore.blockingStoreFiles", "25");
conf.set("hbase.hregion.memstore.block.multiplier", "8");
conf.set("hbase.regionserver.handler.count", "30");
conf.set("hbase.regions.percheckin", "30");
conf.set("hbase.regionserver.globalMemcache.upperLimit", "0.3");
conf.set("hbase.regionserver.globalMemcache.lowerLimit", "0.15");
Job job = new Job(conf);
job.setJarByClass(BulkLoadMapReduce.class);
job.setJobName(NAME);
TextInputFormat.setInputPaths(job, new Path(args[0]));
job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(CustomMap.class);
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(Put.class);
job.setNumReduceTasks(0);
job.setOutputFormatClass(TableOutputFormat.class);
job.waitForCompletion(true);
return 0;
}
public static void main(String[] args) throws Exception {
Long startTime = Calendar.getInstance().getTimeInMillis();
System.out.println("Start time : " + startTime);
int errCode = ToolRunner.run(HBaseConfiguration.create(), new BulkLoadMapReduce(), args);
Long endTime = Calendar.getInstance().getTimeInMillis();
System.out.println("End time : " + endTime);
System.out.println("Duration milliseconds: " + (endTime-startTime));
System.exit(errCode);
}

I've gone through a process that is probably very similar to yours of attempting to find an efficient way to load data from an MR into HBase. What I found to work is using HFileOutputFormat as the OutputFormatClass of the MR.
Below is the basis of my code that I have to generate the job and the Mapper map function which writes out the data. This was fast. We don't use it anymore, so I don't have numbers on hand, but it was around 2.5 million records in under a minute.
Here is the (stripped down) function I wrote to generate the job for my MapReduce process to put data into HBase
private Job createCubeJob(...) {
//Build and Configure Job
Job job = new Job(conf);
job.setJobName(jobName);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Put.class);
job.setMapperClass(HiveToHBaseMapper.class);//Custom Mapper
job.setJarByClass(CubeBuilderDriver.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(HFileOutputFormat.class);
TextInputFormat.setInputPaths(job, hiveOutputDir);
HFileOutputFormat.setOutputPath(job, cubeOutputPath);
Configuration hConf = HBaseConfiguration.create(conf);
hConf.set("hbase.zookeeper.quorum", hbaseZookeeperQuorum);
hConf.set("hbase.zookeeper.property.clientPort", hbaseZookeeperClientPort);
HTable hTable = new HTable(hConf, tableName);
HFileOutputFormat.configureIncrementalLoad(job, hTable);
return job;
}
This is my map function from the HiveToHBaseMapper class (slightly edited ).
public void map(WritableComparable key, Writable val, Context context)
throws IOException, InterruptedException {
try{
Configuration config = context.getConfiguration();
String[] strs = val.toString().split(Constants.HIVE_RECORD_COLUMN_SEPARATOR);
String family = config.get(Constants.CUBEBUILDER_CONFIGURATION_FAMILY);
String column = strs[COLUMN_INDEX];
String Value = strs[VALUE_INDEX];
String sKey = generateKey(strs, config);
byte[] bKey = Bytes.toBytes(sKey);
Put put = new Put(bKey);
put.add(Bytes.toBytes(family), Bytes.toBytes(column), (value <= 0)
? Bytes.toBytes(Double.MIN_VALUE)
: Bytes.toBytes(value));
ImmutableBytesWritable ibKey = new ImmutableBytesWritable(bKey);
context.write(ibKey, put);
context.getCounter(CubeBuilderContextCounters.CompletedMapExecutions).increment(1);
}
catch(Exception e){
context.getCounter(CubeBuilderContextCounters.FailedMapExecutions).increment(1);
}
}
I pretty sure this isn't going to be a Copy&Paste solution for you. Obviously the data I was working with here didn't need any custom processing (that was done in a MR job before this one). The main thing I want to provide out of this is the HFileOutputFormat. The rest is just an example of how I used it. :)
I hope it gets you onto a solid path to a good solution. :

One interesting thing is that during insertion of 1,000,000 rows, 25 Mappers (tasks) are spawned but they run serially (one after another); is this normal?
mapreduce.tasktracker.map.tasks.maximum parameter which is defaulted to 2 determines the maximum number of tasks that can run in parallel on a node. Unless changed, you should see 2 map tasks running simultaneously on each node.

UnavailableException() in Apache-Cassandra 0.8.2

I new to Apache-Cassandra 0.8.2. I am trying to insert some data but getting this exception.
Exception in thread "main" UnavailableException()
at org.apache.cassandra.thrift.Cassandra$insert_result.read(Cassandra.java:14902)
at org.apache.cassandra.thrift.Cassandra$Client.recv_insert(Cassandra.java:858)
at org.apache.cassandra.thrift.Cassandra$Client.insert(Cassandra.java:830)
at TestCassandra.main(TestCassandra.java:166)
My Code is:
public class TestCassandra {
public static void createKeySpace( Cassandra.Client client,String ksname)
throws TException, InvalidRequestException, UnavailableException, UnsupportedEncodingException, NotFoundException, TimedOutException, SchemaDisagreementException {
KsDef ksdef = new KsDef();
ksdef.name = ksname;
ksdef.strategy_class = "NetworkTopologyStrategy";
List l = new ArrayList();
ksdef.cf_defs =l;
client.system_add_keyspace(ksdef);
System.out.println("KeySpace Created");
}
public static void createColumnFamily(Cassandra.Client client,String ksname,String cfname)
throws TException, InvalidRequestException, UnavailableException, UnsupportedEncodingException, NotFoundException, TimedOutException, SchemaDisagreementException {
CfDef cfd = new CfDef(ksname, cfname);
client.system_add_column_family(cfd);
System.out.println("ColumnFamily Created");
}
public static void main(String[] args)
throws TException, InvalidRequestException, UnavailableException, UnsupportedEncodingException, NotFoundException, TimedOutException, SchemaDisagreementException {
TTransport tr = new TFramedTransport(new TSocket("localhost", 9160));
TProtocol proto = new TBinaryProtocol(tr);
Cassandra.Client client = new Cassandra.Client(proto);
tr.open();
String keySpace = "Keyspace1";
String columnFamily = "Users";
//Drop the Keyspace
client.system_drop_keyspace(keySpace);
//Creating keyspace
KsDef ksdef = new KsDef();
ksdef.name = keySpace;
ksdef.strategy_class = "NetworkTopologyStrategy";
List l = new ArrayList();
ksdef.cf_defs =l;
client.system_add_keyspace(ksdef);
System.out.println("KeySpace Created");
//createKeySpace(client,keySpace);
client.set_keyspace(keySpace);
//Creating column Family
CfDef cfd = new CfDef(keySpace, columnFamily);
client.system_add_column_family(cfd);
System.out.println("ColumnFamily Created");
//createColumnFamily(client,keySpace,columnFamily);
ColumnParent parent = new ColumnParent(columnFamily);
Column description = new Column();
description.setName("description".getBytes());
description.setValue("I’m a nice guy".getBytes());
description.setTimestamp(System.currentTimeMillis());
ConsistencyLevel consistencyLevel = ConsistencyLevel.ONE;
ByteBuffer rowid = ByteBuffer.wrap("0".getBytes());
//Line No. 166
client.insert(rowid, parent, description, consistencyLevel);
System.out.println("Record Inserted...");
tr.flush();
tr.close();
}
}
Can anybody help me why this is so?

The reason for the UnavailableException is due to the fact of in your createKeySpace method, you have never specified a replication_factor for your keyspace definition, KsDef.
The 2 Strategy class, NetworkTopologyStrategy and SimpleStrategy requires a replication factor to be set. In Cassandra 0.8 and higher, there is no more a replication_factor field in KsDef so you will have to add it yourself, like so (I've updated your code, but not tested. Also, see that I've changed your strategy_class to SimpleStrategy):
KsDef ksdef = new KsDef();
ksdef.name = ksname;
ksdef.strategy_class = SimpleStrategy.class.getName();
//Set replication factor
if (ksdef.strategy_options == null) {
ksdef.strategy_options = new LinkedHashMap<String, String>();
}
//Set replication factor, the value MUST be an integer
ksdef.strategy_options.put("replication_factor", "1");
//Cassandra must now create the Keyspace based on our KsDef
client.system_add_keyspace(ksdef);
For NetworkTopologyStrategy, you will need to specify your replication factor to each datacentre you've created (See explanation here).
For more information, view my Interfacing with Apache Cassandra 0.8 in Java blog.

Note that I ran in a similar problem (receiving many Unavailable exceptions) because I created a KsDef by code and inadvertently put 10 in there when I was testing on a cluster with only 3 nodes.
So the replication factor said 10 and thus attempting to do a QUORUM read or a QUORUM write would always fail because the QUORUM could never be reached (i.e. 10 / 2 + 1 = at least 6 nodes.)
Fixing my replication factor fixed all the problems with the QUORUM consistency level.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

DSE Cassandra - Why is Astyanax faster than DataStax java driver - java

Related

How to get optimal bulk insertion rate in DynamoDb through Executor Framework in Java?

How does apache spark works?

Apache Curator - Zookeeper connection loss exception, possible memory leak

What is the fastest way to bulk load data into HBase programmatically?

UnavailableException() in Apache-Cassandra 0.8.2

Categories

Resources