Cassandra Exception - java

For my current project i'm using Cassandra Db for fetching data frequently. Within every second at least 30 Db requests will hit. For each request at least 40000 rows needed to fetch from Db. Following is my current code and this method will return Hash Map.
public Map<String,String> loadObject(ArrayList<Integer> tradigAccountList){
com.datastax.driver.core.Session session;
Map<String,String> orderListMap = new HashMap<>();
List<ResultSetFuture> futures = new ArrayList<>();
List<ListenableFuture<ResultSet>> Future;
try {
session =jdbcUtils.getCassandraSession();
PreparedStatement statement = jdbcUtils.getCassandraPS(CassandraPS.LOAD_ORDER_LIST);
for (Integer tradingAccount:tradigAccountList){
futures.add(session.executeAsync(statement.bind(tradingAccount).setFetchSize(3000)));
}
Future = Futures.inCompletionOrder(futures);
for (ListenableFuture<ResultSet> future : Future){
for (Row row: future.get()){
orderListMap.put(row.getString("cliordid"), row.getString("ordermsg"));
}
}
}catch (Exception e){
}finally {
}
return orderListMap;
}
My data request query is something like this,
"SELECT cliordid,ordermsg FROM omsks_v1.ordersStringV1 WHERE tradacntid = ?".
My Cassandra cluster has 2 nodes with 32 concurrent read and write thread for each and my Db schema as follow
CREATE TABLE omsks_v1.ordersstringv1_copy1 (
tradacntid int,
cliordid text,
ordermsg text,
PRIMARY KEY (tradacntid, cliordid)
) WITH bloom_filter_fp_chance = 0.01
AND comment = ''
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE'
AND caching = {
'keys' : 'ALL',
'rows_per_partition' : 'NONE'
}
AND compression = {
'sstable_compression' : 'LZ4Compressor'
}
AND compaction = {
'class' : 'SizeTieredCompactionStrategy'
};
My problem is getting Cassandra timeout exception, how to optimize my code to handle all these requests

It would be better if you would attach the snnipet of that Exception (Read/write exception). I assume you are getting read time out. You are trying to fetch a large data set on a single request.
For each request at least 40000 rows needed to fetch from Db
If you have a large record and resultset is too big, it throws exception if results cannot be returned within a time limit mentioned in Cassandra.yaml.
read_request_timeout_in_ms
You can increase the timeout but this is not a good option. It may resolve the issue (may not throw exception but will take more time to return result).
Solution: For big data set you can get the result using manual pagination (range query) with limit.
SELECT cliordid,ordermsg FROM omsks_v1.ordersStringV1
WHERE tradacntid > = ? and cliordid > ? limit ?;
Or use range query
SELECT cliordid,ordermsg FROM omsks_v1.ordersStringV1 WHERE tradacntid
= ? and cliordid >= ? and cliordid <= ?;
This will be much more faster than fetching the whole resultset.
You can also try by reducing the fetch size. Although it will return the whole resultset.
public Statement setFetchSize(int fetchSize) to check if exception is thrown.
setFetchSize controls the page size, but it doesn't control the
maximum rows returned in a ResultSet.
Another point to be noted:
What's the size of tradigAccountList?
Too many requests at a time also may lead to timeout. Large size of tradigAccountList and a lot of read requests are done at a time (load balancing of requests are handled by Cassandra and how many requests can be handled depends on cluster size and some other factors) may cause this exception .
Some related Links:
Cassandra read timeout
NoHostAvailableException With Cassandra & DataStax Java Driver If Large ResultSet
Cassandra .setFetchSize() on statement is not honoured

Related

Best way to get BigQuery temp table created by Job to read large data faster

I am trying to execute a query over a table in BigQuery using its Java client libraries. I create a Job and then get the result of Job using job.getQueryResults().iterateAll() method.
This way is working but for large data like 600k it takes time around 80-120 seconds. I see BigQuery gets data in 40-45k batches which takes around 5-7 sec each.
I want to get the results faster and I found over internet that if we can get the temporary table created by BigQuery from the Job and the read the data in avro or some other format from that table if will be really fast, but in BigQuery API(using version: 1.124.7) I don't see that way.
Does anyone know how to do that in Java, or how to get data faster in case of large number of records.
Any help is appreciated.
Code to Read Table(Takes 20 sec)
Table table = bigQueryHelper.getBigQueryClient().getTable(TableId.of("project","dataset","table"));
String format = "CSV";
String gcsUrl = "gs://name/test.csv";
Job job = table.extract(format, gcsUrl);
// Wait for the job to complete
try {
Job completedJob = job.waitFor(RetryOption.initialRetryDelay(Duration.ofSeconds(1)),
RetryOption.totalTimeout(Duration.ofMinutes(3)));
if (completedJob != null && completedJob.getStatus().getError() == null) {
log.info("job done");
// Job completed successfully
} else {
log.info("job has error");
// Handle error case
}
} catch (InterruptedException e) {
// Handle interrupted wait
}
Code to read same table using Query(Takes 90 Sec)
Job job = bigQueryHelper.getBigQueryClient().getJob(JobId.of(jobId));
for (FieldValueList row : job.getQueryResults().iterateAll()) {
System.out.println(row);
}
I tried certain ways and based on that found the best way of doing it, just thought to post here to help some one in future.
1: If we use job.getQueryResults().iterateAll() on job or directly on table, it takes same time. So if we don't give batch size BigQuery will use batch size of around 35-45k and fetch the data. So for 600k rows (180Mb) it takes 70-100 sec.
2: We can use the temp table details from created job and use extract job feature of table to write the result in GCS, this will be faster and takes around 30-35 sec. This approach would not download on local for that we again need to use ..iterateAll() on temp table and it will be take same time as 1.
Example pseudo code:
try {
Job job = getBigQueryClient().getJob(JobId.of(jobId));
long start = System.currentTimeMillis();
// FieldList list = getFields(job);
Job completedJob =
job.waitFor(
RetryOption.initialRetryDelay(Duration.ofSeconds(1)),
RetryOption.totalTimeout(Duration.ofMinutes(3)));
if (completedJob != null && completedJob.getStatus().getError() == null) {
log.info("job done");
String gcsUrl = "gs://bucketname/test";
//getting the temp table information of the Job
TableId destinationTableInfo =
((QueryJobConfiguration) job.getConfiguration()).getDestinationTable();
log.info("Total time taken in getting schema ::{}", (System.currentTimeMillis() - start));
Table table = bigQueryHelper.getBigQueryClient().getTable(destinationTableInfo);
//Using extract job to write the data in GCS
Job newJob1 =
table.extract(
CsvOptions.newBuilder().setFieldDelimiter("\t").build().toString(), gcsUrl);
System.out.println("DestinationInfo::" + destinationTableInfo);
Job completedJob1 =
newJob1.waitFor(
RetryOption.initialRetryDelay(Duration.ofSeconds(1)),
RetryOption.totalTimeout(Duration.ofMinutes(3)));
if (completedJob1 != null && completedJob1.getStatus().getError() == null) {
log.info("job done");
} else {
log.info("job has error");
}
} else {
log.info("job has error");
}
} catch (InterruptedException e) {
e.printStackTrace();
}
3: This is the best way which I wanted. It downloads/writes the result faster in local file. It downloads data in around 20 sec. This is the new way BigQuery provides and can be checked using below links:
https://cloud.google.com/bigquery/docs/reference/storage#background
List item
https://cloud.google.com/bigquery/docs/reference/storage/libraries#client-libraries-install-java

String or binary data would be truncated Error when the data has no issues

I have an error on a working pop3 daemon that is supposed to pull email data from the server and insert it into multiple local database tables. But I seem to get this issue in my log every second which I believe is negatively affecting database performance and using a lot of database pools.
After seeing this message I checked the length of my column data and also implemented a code that restricts data from accessing DB if it exceeds specified length. But even after this the error still occurs. It's odd that executing this query separately in a database inserts the data. But running it in WAS causing problems. SQL Server 2015. There are no triggers to the table.
[SQL Error] errorCode : 8152, sqlState : 22001, message : string or binary data would be truncated
INSERT INTO t_mail_rcvinfo
(
rcvInfoId,
mailId,
rcvType,
rcvIdType,
rcvId,
sortNo,
rcvName,
device,
regUserId,
regDate,
chgUserId,
chgDate
)
VALUES
(
'CA2MLe38cc3c33863bb3b26bd8a36edeebc01',
'CA2MLe38cc3be3863bb3b26bd8a360f3fa9c7',
'TO',
'EMAIL',
'datpt#email.com',
'3',
'datpt#email.com',
'PC',
null,
'2020-03-17 12:02:07.056',
null,
'2020-03-17 12:02:07.056'
)
//Implemented Code after looking at the issue
for (int i = 0; i < list1.size(); i++) {
MailRcvInfoVO infoVO = (MailRcvInfoVO)list1.get(i);
if (infoVO.getRcvId().getBytes("UTF-8").length > 200) {
rcvToIdLength = infoVO.getRcvId().getBytes("UTF-8").length;
isValidMail = false;
break;
}
if (infoVO.getRcvName().getBytes("UTF-8").length > 200) {
rcvToNameLength = infoVO.getRcvName().getBytes("UTF-8").length;
isValidMail = false;
break;
}
}

writeConcern is not setting to Acknowledged in mongodb

private val DATABASE:String = config.getString("db.dbname")
private val SERVER:ServerAddress = {
val hostName=config.getString("db.hostname")
val port=config.getString("db.port").toInt
new ServerAddress(hostName,port)
}
val connectionMongo = MongoConnection(SERVER)
def collectionMongo(name:String) = connectionMongo(DATABASE)(name)
val result:WriteResult = collectionMongo("pgroup")
.insert(new BasicDBObject("_id",privateArtGroup.getUuid)
.append("ArtGroupStatus",privateArtGroup.artGroupStatus.toString())
.append("isNew",privateArtGroup.isNew), WriteConcern.Acknowledged)
log.info("what is the write concern " + collectionMongo(pgroup).getWriteConcern)
log.info("what is the write concern "+collectionMongo(pgroup).getWriteConcern)
I am setting WriteConcern to Acknowledged but its not setting
the log stataments prints this from where i get to know its not setting
What is the write concer WriteConcern{w=0, wTimeout=null ms, fsync=null, journal=null
Why w=0 ? it should be w=1
I am using casbah V 3.1.1
val result:WriteResult = collectionMongo("pgroup")
.insert(new BasicDBObject("_id",privateArtGroup.getUuid)
.append("ArtGroupStatus",privateArtGroup.artGroupStatus.toString())
.append("isNew",privateArtGroup.isNew), WriteConcern.Acknowledged)
WriteConcern.Acknowledged - Write operations that use this write concern will wait for acknowledgement from the primary server before returning.
w: 1 - Requests acknowledgement that the write operation has propagated to the standalone mongod or the primary in a replica set.
Reason for Why w=0 ? i
Once the given insert query is executed with writeconcern acknowledge the job is done. Moreover we are setting the writeconcern for the insert query alone and not for the collection. This could be a reason that you are getting w=0.
But still I couldn't figure out - In general we have w: 1 is the default write concern for MongoDB and why you are getting w=0.

clearing batch preparedstatements

I have a java application which read files and writes to oracle db row by row.
We have come across a strange error during batch insert which does not occur during sequential insert. The error is strange because it occurs only with IBM JDK7 on AIX platform and I get this error on different rows every time. My code looks like below:
prpst = conn.prepareStatement(query);
while ((line = bf.readLine()) != null) {
numLine++;
batchInsert(prpst, line);
//onebyoneInsert(prpst, line);
}
private static void batchInsert(PreparedStatement prpst, String line) throws IOException, SQLException {
prpst.setString(1, "1");
prpst.setInt(2, numLine);
prpst.setString(3, line);
prpst.setString(4, "1");
prpst.setInt(5, 1);
prpst.addBatch();
if (++batchedLines == 200) {
prpst.executeBatch();
batchedLines = 0;
prpst.clearBatch();
}
}
private static void onebyoneInsert(PreparedStatement prpst, String line) throws Exception{
int batchedLines = 0;
prpst.setString(1, "1");
prpst.setInt(2, numLine);
prpst.setString(3, line);
prpst.setString(4, "1");
prpst.setInt(5, 1);
prpst.executeUpdate();
}
I get this error during batch insert mode :
java.sql.BatchUpdateException: ORA-01461: can bind a LONG value only for insert into a LONG column
at oracle.jdbc.driver.OraclePreparedStatement.executeBatch(OraclePreparedStatement.java:10345)
I already know why this Ora error occurs but this is not my case. I am nearly sure that I am not setting some large data to a smaller column. May be I am hitting some bugs in IBM jdk7 but could not prove that.
My question if there is a way that I can avoid this problem ? One by one insert is not an option because we have big files and it takes too much time.
Try with
prpst.setInt(5,new Integer(1))
What is the type of variable "numLine"?
Can you share type of columns corresponding to the fields you set in PreparedStatement?
Try once by processing with "onebyoneInsert". Share the output for this case. It might help identifying root cause.
Also print value of "numLine" to console.

JSON to SSTable tool out-of-memory failure

json2sstable tool supplied with Cassandra 1.2.15 fails with out-of-memory error. Back in 2011 a similar issue was reported as bug and fixed: https://issues.apache.org/jira/browse/CASSANDRA-2189
Either I am missing some steps in the tool configuration/usage or the bug has re-emerged. Please point out what I am missing.
Repro steps:
1) Cassandra 1.2.15, one table with varchar key and one varchar column filled with random uuids, 6x10^6 records.
2) JSON file generated with sstable2json tool (~1G).
3) Cassandra restarted with new configuration (new data/cache/commit dirs, new partitioner)
4) Keyspace re-created
5) json2sstable fails after several minutes of processing:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.<init>(String.java:203)
at org.codehaus.jackson.util.TextBuffer.contentsAsString(TextBuffer.java:350)
at org.codehaus.jackson.impl.Utf8StreamParser.getText(Utf8StreamParser.java:278)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:59)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapArray(UntypedObjectDeserializer.java:165)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:51)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapArray(UntypedObjectDeserializer.java:165)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:51)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.mapObject(UntypedObjectDeserializer.java:204)
at org.codehaus.jackson.map.deser.std.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:47)
at org.codehaus.jackson.map.deser.std.ObjectArrayDeserializer.deserialize(ObjectArrayDeserializer.java:104)
at org.codehaus.jackson.map.deser.std.ObjectArrayDeserializer.deserialize(ObjectArrayDeserializer.java:18)
at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2695)
at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1294)
at org.codehaus.jackson.JsonParser.readValueAs(JsonParser.java:1368)
at org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport.java:344)
at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:328)
at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:547)
From json2sstable source code, the tool loads all the records from json file into memory and sorts records by keys:
private int importUnsorted(String jsonFile, ColumnFamily columnFamily, String ssTablePath, IPartitioner<?> partitioner) throws IOException
{
int importedKeys = 0;
long start = System.currentTimeMillis();
JsonParser parser = getParser(jsonFile);
Object[] data = parser.readValueAs(new TypeReference<Object[]>(){});
keyCountToImport = (keyCountToImport == null) ? data.length : keyCountToImport;
SSTableWriter writer = new SSTableWriter(ssTablePath, keyCountToImport);
System.out.printf("Importing %s keys...%n", keyCountToImport);
// sort by dk representation, but hold onto the hex version
SortedMap<DecoratedKey,Map<?, ?>> decoratedKeys = new TreeMap<DecoratedKey,Map<?, ?>>();
for (Object row : data)
{
Map<?,?> rowAsMap = (Map<?, ?>)row;
decoratedKeys.put(partitioner.decorateKey( hexToBytes((String)rowAsMap.get("key"))), rowAsMap);
....
According to Jonathan Elis' comment in CASSANDRA-2322 issue the behavior is by design.
Thus json2sstable is not very well suited for importing production size data to Cassandra. The tool is likely to crash on large datasets.

Categories