I recently started working with Cassandra database. I was able to setup single node cluster in my local machine.
And now I was thinking to start writing some sample data to Cassandra Database using Pelops client.
Below is the keyspace and column family I have created so far-
create keyspace my_keyspace with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = {replication_factor:1};
use my_keyspace;
create column family users with column_type = 'Standard' and comparator = 'UTF8Type';
Below is the code that I have so far. I made some progress as before I was getting some exception which I was able to fix it. Now I am getting another exception
public class MyPelops {
private static final Logger log = Logger.getLogger(MyPelops.class);
public static void main(String[] args) throws Exception {
// A comma separated List of Nodes
String NODES = "localhost";
// Thrift Connection Pool
String THRIFT_CONNECTION_POOL = "Test Cluster";
// Keyspace
String KEYSPACE = "my_keyspace";
// Column Family
String COLUMN_FAMILY = "users";
Cluster cluster = new Cluster(NODES, 9160);
Pelops.addPool(THRIFT_CONNECTION_POOL, cluster, KEYSPACE);
Mutator mutator = Pelops.createMutator(THRIFT_CONNECTION_POOL);
log.info("- Write Column -");
mutator.writeColumn(
COLUMN_FAMILY,
"Row1",
new Column().setName(" Name ".getBytes()).setValue(
" Test One ".getBytes()));
mutator.writeColumn(
COLUMN_FAMILY,
"Row1",
new Column().setName(" Work ".getBytes()).setValue(
" Engineer ".getBytes()));
log.info("- Execute -");
mutator.execute(ConsistencyLevel.ONE);
Selector selector = Pelops.createSelector(THRIFT_CONNECTION_POOL);
int columnCount = selector.getColumnCount(COLUMN_FAMILY, "Row1",
ConsistencyLevel.ONE);
log.info("- Column Count = " + columnCount);
List<Column> columnList = selector
.getColumnsFromRow(COLUMN_FAMILY, "Row1",
Selector.newColumnsPredicateAll(true, 10),
ConsistencyLevel.ONE);
log.info("- Size of Column List = " + columnList.size());
for (Column column : columnList) {
log.info("- Column: (" + new String(column.getName()) + ","
+ new String(column.getValue()) + ")");
}
log.info("- All Done. Exit -");
System.exit(0);
}
}
Whenever I am running this program, I am getting this exception-
Exception in thread "main" org.scale7.cassandra.pelops.exceptions.InvalidRequestException: Column timestamp is required
And this exception is coming as soon as it tries to execute this line-
mutator.execute line
As I mentioned above, I am new to Cassandra Database and Pelops client as well. This is my first time working with that. Can anyone help me with this problem with step by step process? I am running Cassandra 1.2.3 in my local box.
Any step by step guidance like how to insert data in Cassandra database will help me a lot in understanding how Cassandra works.
Thanks in advance.
Each cassandra column is a Key-Value-Timestamp triplet.
You didn't set the timestamp in yours columns
Column c = new Column();
c.setTimestamp(System.currentTimeMillis());
You can use the client way to create the column and make the job easier
mutator.writeColumn(
COLUMN_FAMILY,
"Row1",
mutator.newColumn(" Name ", " Test One "));
In this way you can avoid both setting the timestamp (the client will do it for you) and using getBytes() on String.
Regards, Carlo
Related
I am trying to iterate the rows from TableResult using getValues() as below.
if I use getValues(), it's retrieving only the first page rows. I want to iterate all the rows using getValues() and NOT using iterateAll().
In the below code, the problem is its going infinite time. not ending. while(results.hasNextPage()) is not ending. what is the problem in the below code?
{
query = "select from aa.bb.cc";
QueryJobConfiguration queryConfig =
QueryJobConfiguration.newBuilder(query)
.setPriority(QueryJobConfiguration.Priority.BATCH)
.build();
TableResult results = bigquery.query(queryConfig);
int i = 0;
int j=0;
while(results.hasNextPage()) {
j++;
System.out.println("page " + j);
System.out.println("Data Extracted::" + i + " records");
for (FieldValueList row : results.getNextPage().getValues()) {
i++;
}
}
System.out.println("Total Count::" + results.getTotalRows());
System.out.println("Data Extracted::" + i + " records");
}
I have only 200,000 records in the source table. below is the out put and I forcefully stopped the process.
page 1
Data Extracted::0 records
page 2
Data Extracted::85242 records
page 3
Data Extracted::170484 records
page 4
Data Extracted::255726 records
page 5
Data Extracted::340968 records
page 6
Data Extracted::426210 records
page 7
Data Extracted::511452 records
page 8
Data Extracted::596694 records
.......
.......
.......
.......
In short, you need to update TableResults variable with your getNextPage() variable. If you don't update it you will always be looping the same results over and over. Thats why you are getting tons of records in your output.
If you check the following samples: Bigquery Pagination and Using Java Client Library. There are ways that we can deal with pagination results. Although not specific for single run queries.
As show on the code below, which is partially based on pagination sample, you need to use the output of getNextPage() to update results variable and proceed to perform the next iteration inside the while up until it iterates all pages but the last.
QueryRun.Java
package com.projects;
// [START bigquery_query]
import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.QueryJobConfiguration;
import com.google.cloud.bigquery.TableResult;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobId;
import com.google.cloud.bigquery.FieldValueList;
import com.google.cloud.bigquery.JobInfo;
import com.google.cloud.bigquery.BigQuery.QueryResultsOption;
import java.util.UUID;
import sun.jvm.hotspot.debugger.Page;
public class QueryRun {
public static void main(String[] args) {
String projectId = "bigquery-public-data";
String datasetName = "covid19_ecdc_eu";
String tableName = "covid_19_geographic_distribution_worldwide";
String query =
"SELECT * "
+ " FROM `"
+ projectId
+ "."
+ datasetName
+ "."
+ tableName
+ "`"
+ " LIMIT 100";
System.out.println(query);
query(query);
}
public static void query(String query) {
try {
BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
QueryJobConfiguration queryConfig = QueryJobConfiguration.newBuilder(query).build();
// Create a job ID so that we can safely retry.
JobId jobId = JobId.of(UUID.randomUUID().toString());
Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());
TableResult results = queryJob.getQueryResults(QueryResultsOption.pageSize(10));
int i = 0;
int j =0;
// get all paged data except last line
while(results.hasNextPage()) {
j++;
for (FieldValueList row : results.getValues()) {
i++;
}
results = results.getNextPage();
print_msg(i,j);
}
// last line run
j++;
for (FieldValueList row : results.getValues()) {
i++;
}
print_msg(i,j);
System.out.println("Query performed successfully.");
} catch (BigQueryException | InterruptedException e) {
System.out.println("Query not performed \n" + e.toString());
}
}
public static void print_msg(int i,int j)
{
System.out.println("page " + j);
System.out.println("Data Extracted::" + i + " records");
}
}
// [END bigquery_query]
output:
SELECT * FROM `bigquery-public-data.covid19_ecdc_eu.covid_19_geographic_distribution_worldwide` LIMIT 100
page 1
Data Extracted::10 records
page 2
Data Extracted::20 records
page 3
Data Extracted::30 records
page 4
Data Extracted::40 records
page 5
Data Extracted::50 records
page 6
Data Extracted::60 records
page 7
Data Extracted::70 records
page 8
Data Extracted::80 records
page 9
Data Extracted::90 records
page 10
Data Extracted::100 records
Query performed successfully.
As a final note, there are not official sample about pagination for queries so I'm not totally sure of the recommended way to handle pagination with java. Its not quite clear on the BigQuery for Java documentation page. If you can update your question with your approach to pagination I would appreciate.
If you have issues running the attached sample please see Using the BigQuery Java client sample, its github page and its pom.xml file inside of it and check if you are in compliance with it.
Probably I Am late in the response.
But reading on the java client guide (https://cloud.google.com/bigquery/docs/quickstarts/quickstart-client-libraries#complete_source_code)
it says that:
Iterate over the QueryResponse to get all the rows in the results. The iterator automatically handles pagination. Each FieldList exposes the columns by numeric index or column name.
Sai that, it should be easier to simply use the iterateAll() method.
Let me know if I Am wrong.
I'm trying to use vmware sdk for java to collect the perfomance data of each entity (cluster/datastore/Host/VM) in the vmware environment.
The idea is to get the available PerfMetricIds for the target entity with queryAvailablePerfMetric, query those and report the details of the counter, the timestamp and the value.
However when I get the PerfMetricIds for an entity, not every detected (returned) PerfMetricId is reporting data. For example for each Datastore I get at least 4 ids which do not return data when queried, these IDs represent the counters associated with the average number of read and write operations, and for a cluster I'm missing the cpu usage, and so on ...
so I was wondering when does this happen? Shouldn't every metric returned by queryAvailablePerfMetric report data? what am I missing here?
Minimal code snippet:
// VMWare credentials
String vmwareUrl = args[0];
String vmwareUsername = args[1];
String vmwarePassword = args[2];
// connect to vCenter
ServiceInstance si = new ServiceInstance(new URL(vmwareUrl), vmwareUsername, vmwarePassword, true);
// get performance manager
PerformanceManager perfMgr = si.getPerformanceManager();
// define the time window (the last one hour)
Calendar calTo = Calendar.getInstance();
Calendar calFrom = Calendar.getInstance();
calFrom.setTime(calTo.getTime());
calFrom.add(Calendar.HOUR, -1);
// get any datastore for testing purposes
Folder rootFolder = si.getRootFolder();
ManagedEntity[] datastores = new InventoryNavigator(rootFolder).searchManagedEntities("Datastore");
ManagedEntity me = datastores[1];
// query all available metrics for the entity
PerfMetricId[] availablePmis = perfMgr.queryAvailablePerfMetric(me, calFrom, calTo, perfMgr.getHistoricalInterval()[0].getSamplingPeriod());
// create PerfQuerySpec
PerfQuerySpec qSpec = new PerfQuerySpec();
qSpec.setEntity(me.getMOR());
qSpec.setMetricId(availablePmis);
qSpec.setFormat("csv");
qSpec.setStartTime(calFrom);
qSpec.setEndTime(calTo);
// query perf
PerfEntityMetricBase[] perfValues = perfMgr.queryPerf(new PerfQuerySpec[]{qSpec});
// Printing
System.out.println("Found pmis (CounterIDs only): ");
for (PerfMetricId pmi : availablePmis){
System.out.print(pmi.getCounterId() + ", ");
}
System.out.print("\nPmis with values:");
int pmisCount=0;
for (PerfEntityMetricBase value : perfValues) {
PerfMetricSeriesCSV[] csvValues = ((PerfEntityMetricCSV) value).getValue();
pmisCount += csvValues.length;;
for (PerfMetricSeriesCSV csv : csvValues) {
System.out.println("Counter ID: " + csv.getId().getCounterId() + " ---- Metric instance: " + csv.getId().getInstance());
System.out.println("\tInfo: " + ((PerfEntityMetricCSV) value).getSampleInfoCSV());
System.out.println("\tValues: " + csv.getValue());
}
}
System.out.println("---------------");
System.out.println("Detected PMIs: " + availablePmis.length);
System.out.println("PMIs with values: " + pmisCount);
Any help (or discussions) would be appreciated
I integrate two systems and i have to insert data from one client table to another in another server, without any business logic or datamodification, onece per week. Every time when its run i have to inesrt all data. So i wrote camel configuration which i atached below. Its working for small piece of data but when clients table has over than 20000 rows i get exeption. java.lang.OutOfMemoryError: GC overhead limit exceeded. I try change java memory like "set JAVA_OPTS=-Dfile.encoding=UTF-8 -Xms2048m -Xmx16384m -XX:PermSize=1012m -XX:MaxPermSize=2048m -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit". But its not helps.
enter image description here
enter image description here
i am working on the same tool.
To query and the insert large tables i am using jdbc StreamList.
In this example i work with two datasources,
dataSource1 for query and dataSource2 for insert.
Query from dataSource1 processed as stream, then body split row by row, so not all data at once, then each row are written to file and also an prepared insert statement is build and executed for dataSource2.
private void createRoute(String tableName) {
this.from("timer://timer1?repeatCount=1") //
.setBody(this.simple("delete from " + tableName)) //
.to("jdbc:dataSource2") //
.setBody(this.simple("select * from " + tableName)) //
.to("jdbc:dataSource1?outputType=StreamList") //
// .to("log:stream") //
.split(this.body())//
.streaming() //
// .to("log:row") //
.multicast().to("direct:file", "direct:insert") //
.end().log("End");
this.from("direct:file")
// .marshal(jsonFormat) //
.marshal(new CsvDataFormat().setDelimiter(';'))//
.to("stream:file?fileName=output/" + tableName + ".txt").log("Data: ${body}");
this.from("direct:insert").process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
StringBuilder insert = new StringBuilder("insert into ").append(tableName).append(" (");
StringBuilder values = new StringBuilder("values(");
LinkedHashMap<?, ?> body = (LinkedHashMap<?, ?>) exchange.getIn().getBody();
Iterator<String> i = (Iterator<String>) body.keySet().iterator();
while (i.hasNext()) {
String key = i.next();
insert.append(key);
values.append(":?" + key);
exchange.getOut().getHeaders().put(key, body.get(key));
if (i.hasNext()) {
insert.append(",");
values.append(",");
} else {
insert.append(") ");
values.append(") ");
}
}
String sql = insert.append(values).toString();
exchange.getOut().setBody(sql);
}
}) //
.log("SQL: ${body}")//
.to("jdbc:dataSource2?useHeadersAsParameters=true");
i am using FT search on $message id field to retrieve the parent document.my database is FT indexed. i need the parent document for accepting the meeting invitation. how ever i am able to retrieve document after 2 hours of getting meeting invitation. need help.
String messageiD="<OFF0E85FF0.91FEF356-ON65257C97.00360343-65257C97.00361318#LocalDomain>";
if (messageiD.contains("#")) {
String[] strArr = messageiD.split("#");
messageiD = strArr[0].replace("<", "");
System.out.println("message id is "+messageiD);
//return messageiD;
}
String qry = "Field $MessageID CONTAINS " + messageiD;
DocumentCollection col1 = m_database.FTSearch(qry);
System.out.println("doc col length is " +col1.getCount());
Document docOld = col1.getFirstDocument();
System.out.println(docOld.getNoteID());
If you are able to retrieve the result one hour / two hours later, then the FT- Index is not up to date, when it tries to process your request. Use method updateFTIndex() of NotesDatabase- class to make sure, it is up to date. Of course you can check, if it IS up to date, using getLastFTIndexed()- Method... Here is example- code from the Designer- Help to use these two methods:
try {
Session session = getSession();
AgentContext agentContext =
session.getAgentContext();
// (Your code goes here)
Database db = agentContext.getCurrentDatabase();
String title = db.getTitle();
DateTime lastDT = db.getLastFTIndexed();
DateTime nowDT = session.createDateTime("Today");
nowDT.setNow();
int daysSince =
nowDT.timeDifference(lastDT) / 86400;
if (daysSince > 2) {
System.out.println("Database \"" + title +
"\" was last full-text indexed " +
daysSince + " days ago");
System.out.println("Updating");
db.updateFTIndex(true); }
else
System.out.println("Database \"" + title +
"\" was full-text indexed less
than two days ago");
} catch(Exception e) {
e.printStackTrace();
}
Additional Information: When creating a fulltext index for a database you define, how often this index is updated.
But: Even when selecting "Immediate" in the dialog, this does not mean, that the index will always be up to date. Updating the fulltext is a job of the Update- task of the server. If this task is to "busy" then the request is queued and might be delayed for some time until there are resources available for doing the job.
The performance of fulltextindex updates can be enhanced by the server admin by setting a notes.ini- Variable "UPDATE_FULLTEXT_THREAD" (see this link about the variable to check details).
I run the following code on my local (mac) machine and on a remote unix server.:
public void deleteValue(final String id, final String value) {
log.info("Removing value " + value);
final Collection<String> valuesBeforeRemoval = getValues(id);
final MutationBatch m = keyspace.prepareMutationBatch();
m.withRow(VALUES_CF, id).deleteColumn(value);
try {
m.execute();
} catch (final ConnectionException e) {
log.error("Unable to delete location " + value, e);
}
final Collection<String> valuesAfterRemoval = getValues(id);
if (valuesAfterRemoval.size()!=(valuesBeforeRemoval.size()-1)) {
log.error("value " + value + " was supposed to be removed from list " + valuesBeforeRemoval + " but it wasn't: " + valuesAfterRemoval);
}
...
}
protected Collection<String> getValues(final String id) {
try {
final OperationResult<ColumnList<String>> operationResult = keyspace
.prepareQuery(VALUES_CF).getKey(id).execute();
final ColumnList<String> result = operationResult.getResult();
if (result.isEmpty()) {
log.info("No value found for id: " + id);
return new ArrayList<String>();
}
return result.getColumnNames();
} catch (final ConnectionException e) {
log.error("Unable to retrieve session " + id, e);
}
return new ArrayList<String>();
}
Locally, that line is never executed, which makes sense:
log.error("value " + value + " was supposed to be removed from list " + valuesBeforeRemoval + " but it wasn't: " + valuesAfterRemoval);
but that line is executed on my dev server:
[ERROR] [main] [n.o.w.s.d.SessionDaoCassandraImpl] [2013-03-08 13:12:24,801]
[] - value 3 was supposed to be removed from list [3, 2, 1, 0, 7, 6, 5, 4, 9, 8] but it wasn't: [3, 2, 1, 0, 7, 6, 5, 4, 9, 8]
I am using com.netflix.astyanax
Both my local machine and the remote dev server connect to the very
same cassandra instance.
Both my local machine and the remote dev server run the very same test
creating a new row family, and adding 10 records before one is deleted.
When the error occurs on dev, log.error("Unable to delete
location " + value, e); was not executed (i.e. running the deletion
command didn't produce any exception).
I am 100% positive that no other code is affecting the content of the
database while I am running the test on dev so this isn't some
strange concurrency issue.
What could possibly explain that the deleteColumn(value) request runs without producing any error but still does not remove the column from the database?
ADDITIONAL INFO
Here is how I created the keyspace:
create keyspace sessiondata
with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
and strategy_options = {replication_factor:1};
Here is how I created the column family values, referenced as VALUES_CF in the code above:
create column family values
with comparator = UTF8Type
;
Here is how the keyspace referenced in the java code above is defined:
final AstyanaxContext.Builder contextBuilder = getBuilder();
final AstyanaxContext<Keyspace> keyspaceContext = contextBuilder
.forKeyspace(keyspaceName).buildKeyspace(
ThriftFamilyFactory.getInstance());
keyspaceContext.start();
keyspace = keyspaceContext.getEntity();
where getBuilder is:
private Builder getBuilder() {
final AstyanaxConfigurationImpl conf = new AstyanaxConfigurationImpl()
.setDiscoveryType(NodeDiscoveryType.NONE)
.setRetryPolicy(new RunOnce());
final ConnectionPoolConfigurationImpl poolConf = new ConnectionPoolConfigurationImpl("MyPool")
.setPort(port)
.setMaxConnsPerHost(1)
.setSeeds(value);
return new AstyanaxContext.Builder()
.forCluster(cluster)
.withAstyanaxConfiguration(conf)
.withConnectionPoolConfiguration(poolConf)
.withConnectionPoolMonitor(new CountingConnectionPoolMonitor());
}
SECOND UPDATE
First, the issues are not solely related to deletes. I observe similar problems when updating records in the database, reading them, and not being able to read the updates I just wrote
Second, I created a test that does 100 times the following operations:
write a row into cassandra
update that row in cassandra
read back that row from cassandra and check whether the row was indeed updated, and checking again regularly after delays if it wasn't
What I observe from that test is that:
again, when I run that code locally, all 100 iterations pass right away (no retry ever needed)
when I run that code on the remote server, some of the iterations pass, some fail. When they fail, no matter how large the delay (I wait up to 10 seconds), the test always fail.
At this point, I am really not sure how any cassandra setup could explain this behavior since I connect to the very same server for my tests and since the delays I insert are much larger than any additional latency I may need to run the test when connecting from my local machine.
The only relevant difference seems to be which machine the code is running on.
THIRD UPDATE
If in the test mentioned in the previous update, I insert a delay between the 2 writes, the code starts passing if the delay is >= 1,000 ms. A delay of, say, 100 ms doesn't help. I also modified the builder to set the default read and write consistencies to the most demanding: ALL, and that had no impact on the results of the test (still failing about half of the time unless delay between writes >1s):
final AstyanaxConfigurationImpl conf = new AstyanaxConfigurationImpl()
.setDiscoveryType(NodeDiscoveryType.NONE)
.setRetryPolicy(new RunOnce()).setDefaultReadConsistencyLevel(ConsistencyLevel.CL_ALL).setDefaultWriteConsistencyLevel(ConsistencyLevel.CL_ALL);
To debug, try printing the full row instead of just the column names. When I say the full row I mean the column name, column value and the time stamp. A long shot is clocks are wrong on one of your test machines and this is throwing out your tests on the other.
Another thing to double check is that ip is indeed what you think it is, in both your application and cassandra. When you retrieve it print it between something, like println("-" + ip "-"). Before and after your try block for the execute in deleteSecureLocation do a get for only that column, not the entire row. I'm not too sure how to do that in astynax, on the cli it would be get[id][ip].
Something to keep in mind is that a delete won't fail even if there's nothing to delete. To cassandra it's a write, the only thing that will make it a delete is if on read it's the latest timestamped entry against that row/column name.