I am aware that BigTable supports operations append and increment using ReadModifyWriteRow requests, but I'm wondering if there is support or an alternative way to use more generic mapping functions where the value from the cell can be accessed and modified within some sort of closure? For instance, bitwise ANDing a long value in a cell:
Function<Long, Long> modifyFunc = f -> f & 10L;
ReadModifyWriteRow
.create("tableName", "rowKey")
.apply("family", "qualifier", modifyFunc);
Doing a mapping like this is not supported by Bigtable, so here is an option you could try. This will only work with single cluster instances due to consistency required for it.
You could add a column to keep track of row version (in addition to the existing row versions) and then you can read the data and version, modify it in memory and then do a checkAndMutate with the version and new value. Something like this:
Row row = dataClient.readRow(tableId, rowkey);
ArrayList<RowCell> cells = row.getCells();
// Get the value and timestamp/version from the cell you are targetting.
RowCell cell = cells.get(...);
long version = cell.getTimestamp();
ByteString value = cell.getValue();
// Do your mapping to the new value.
ByteString newValue = ...;
Mutation mutation =
Mutation.create().setCell(COLUMN_FAMILY_NAME, COLUMN_NAME, timestamp, newValue);
// Filter on a column that tracks the version to do validation.
Filter filter =
FILTERS
.chain()
.filter(FILTERS.family().exactMatch(COLUMN_FAMILY_NAME))
.filter(FILTERS.qualifier().exactMatch(VERSION_COLUMN))
.filter(FILTERS.value().exactMatch(version));
ConditionalRowMutation conditionalRowMutation =
ConditionalRowMutation.create(tableId, rowkey).condition(filter).then(mutation);
boolean success = dataClient.checkAndMutateRow(conditionalRowMutation);
Related
I want to do batch insert to postgres using jooq:
List<MyTableRecord> records = new ArrayList<>();
for (Dto dto : dtos) {
Field<Long> sequenceId = SEQUENCE.nextval();
Long id = using(ctx).select(sequenceId).fetchOne(sequenceId);
records.add(mapToRecord(dto, id));
}
using(ctx).batchInsert(records).execute();
The problem is that I am fetching next sequence number for each row.
For simple insert I can use Field in statement like this:
create.insertInto(ID, VALUE)
.values(SEQUENCE.nextval(), val("William"))
.execute();
How can I do so with batch insert?
Pre-fetch all the sequence values
You could pre-fetch all the sequence values you need using this:
List<Long> ids = using(ctx)
.select(sequenceId)
.from(generateSeries(1, dtos.size()))
.fetch(sequenceId);
for (int i = 0; i < dtos.size(); i++)
records.add(mapToRecord(dtos.get(i), ids.get(i)));
using(ctx).batchInsert(records).execute();
This seems like a useful feature to have out of the box, in an RDBMS agnostic way via using(ctx).nextvals(SEQUENCE, dtos.size()). We'll consider this for a future jOOQ version: https://github.com/jOOQ/jOOQ/issues/10658
Don't use records
An alternative is to batch actual INSERT statements instead of Record.insert() calls via batchInsert(). That way, you can put the SEQUENCE.nextval() expression in the statement. See: https://www.jooq.org/doc/latest/manual/sql-execution/batch-execution/
I am just trying to convert below code in java 8, But unable to figure out how to do that here is my code:
BigDecimal previousVal = BigDecimal.ZERO;
for (SequenceHistory ele : histories) {
Row row = new Row();
previousVal = previousVal.add(ele.getSeqTotal() == null ? BigDecimal.ZERO : ele.getSeqTotal());
row.setTotal(previousVal);
rows.add(row);
}
I did it by using class level variable but its didn't work because constructor issharing in multiple calls, and val was persist for other calls as well. Any suggestion would be appreciated.
I don't recommend to enforce java-stream here (explanation below).
There is a problem that you want to iterate List<SequenceHistory> and increment and use another value in the meanwhile with each iteration. This increment cannot happen easily in the lambda expression because BigDecimal is immutable and you cannot reassign in the lambda expression such a new value since there is a requirement the variables used in the lambda expression must be effectively final. For this reason, you can use AtomicReference<T> that assures the variable itself is qualified for the lambda expression and the mutability operations are encapsulated.
AtomicReference<BigDecimal> ref= new AtomicReference<>(BigDecimal.ZERO);
List<Row> rows = histories.stream()
.map(SequenceHistory::getSeqTotal) // get the total
.map(total -> total == null ? BigDecimal.ZERO : total) // value or ZERO
.map(total -> ref.accumulateAndGet(total, BigDecimal::add)) // increment and get
.map(total -> new Row(total)) // create a new Row
.collect(Collectors.toList()); // collect as a List
I changed the setter of the clas Row to a constructor for the sake of brevity. .map(total -> { Row row = new Row(); row.setTotal(total); return row; }) would be used otherwise.
Conclusion: This solution demonstrates why java-stream is not suitable for such processing where you rely on the result of mutable operations using concurrent implementations for sequence processing.
I want to copy data from one HBase table to another using Java APIs, but not able to find one. Is there any Java API to do the same?
Thanks.
The following is not by far the most optimized way - but from the tone of the question it seems performance is not the critical factor here.
First, you need to set up your HBaseConfiguration and your input / output tables:
Configuration config = HBaseConfiguration.create();
HTable inputTable = new HTable(config, "input_table");
HTable outputTable = new HTable(config, "output_table");
What you want is a "Scan", which allows a range scan to be performed. You need to define the query parameters, by adding columns to a Scan object.
Scan scan = new Scan(Bytes.toBytes("smith-"));
scan.addColumn(Bytes.toBytes("personal"), Bytes.toBytes("givenName"));
scan.addColumn(Bytes.toBytes("contactinfo"), Bytes.toBytes("email"));
scan.setFilter(new PageFilter(25));
Now you are ready to invoke the scan object and retrieve results:
ResultScanner scanner = inputTable.getScanner(scan);
for (Result result : scanner) {
putToOutputTable(result);
}
Now to save to the second table, you will either do Put's within the for loop, or aggregate the results into a List/Array or similar for a bulk put.
protected void putToOutputTable(Result result) {
// Retrieve the Map of families to their most recent qualifiers and values.
NavigableMap<byte[],NavigableMap<byte[],byte[]>> map = result.getNoVersionMap();
for ( // iterate through the family/values map entries for this result ) {
// Convert the result to the row key and the column values here ..
// specifically set the rowKey, colFamily, colQualifier, and colValue(s)
Put p = new Put(Bytes.toBytes(rowKey));
// To set the value you'd like to update in the row 'myLittleRow',
// specify the column family, column qualifier, and value of the table
// cell you'd like to update. The column family must already exist
// in your table schema. The qualifier can be anything.
// All must be specified as byte arrays as hbase is all about byte
// arrays. Lets pretend the table 'myLittleHBaseTable' was created
// with a family 'myLittleFamily'.
p.add(Bytes.toBytes(colFamily), Bytes.toBytes(colQualifier),
Bytes.toBytes(colValue));
}
table.put(p);
}
If instead you want a more scalable version, take a look at how to use map/reduce to read from input hdfs files / write to output hbase tables here: Hbase Map/Reduce
Consider a collection of objects having fields like:
{
id: // String
type: //Integer
score: //Double value
}
I would like to query on collection using type and for returned documents divide their scores by their maximum. Consider following query oject:
DBObject searchQuery = new BasicDBObject("type", 2);
collection.find(searchQuery);
With above query it'll return some documents. I want to get maximum of scores among all those documents and then divide all those documents' score by obtained maximum.
How can I do this??
I could find maximum using aggregation as follows:
String propertyToOperateOn = "score";
DBObject match = new BasicDBObject("$match", searchQuery);
DBObject groups = new BasicDBObject("_id", null);
DBObject operation = new BasicDBObject("$max", "$" + propertyToOperateOn);
groups.put("maximum", operation);
DBObject apply = new BasicDBObject("$group", groups);
AggregationOutput output = mongoConstants.IAScores.aggregate(match, apply);
Here output will contain the maximum value. But then how can I update (divide) all documents' scores by this maximum??
I hope there could be better way to do this task, but I'm unable to get it as I'm very much new to mongodb (or any database as such).
This is technically the same issue as "mongodb: java: How to update a field in MongoDB using expression with existing value", but I'll repeat the answer:
At the moment, MongoDB doesn't allow you to update the value of a field according to an existing value of a field. Which means, you can't do the following SQL:
UPDATE foo SET field1 = field1 / 2;
In MongoDB, you will need to do this in your application, but be aware that this is no longer an atomic operation as you need to read and then write.
Using Hbase API (Get/Put) or HBQL API, is it possible to retrieve timestamp of a particular column?
Assuming your client is configured and you have a table setup. Doing a get returns a Result
Get get = new Get(Bytes.toBytes("row_key"));
Result result_foo = table.get(get);
A Result is backed by a KeyValue. KeyValues contain the timestamps. You can get either a list of KeyValues with list() or get an array with raw(). A KeyValue has a get timestamp method.
result_foo.raw()[0].getTimestamp()
I think the follow will be better:
KeyValue kv = result.getColumnLatest(family, qualifier);
String status = Bytes.toString(kv.getValue());
Long timestamp = kv.getTimestamp();
since Result#getValue(family, qualifier) is implemented as
public byte[] getValue(byte[] family, byte[] qualifier) {
KeyValue kv = this.getColumnLatest(family, qualifier);
return kv == null ? null : kv.getValue();
}
#codingFoo's answer assumes all timestamps are the same for all cells, but op's question was for a specific column. In that respect, similar to #peibin wang's answer, I would propose the following if you would like the last timestamp for your column:
Use the getColumnLatestCell method on your Result object, and then call the getTimestamp method like so:
Result res = ...
res.getColumnLatestCell(Bytes.toBytes("column_family"), Bytes.toBytes("column_qualifier")).getTimestamp();
If you want access to a specific timestamp you could use the getColumnCells which returns all cells for a specified column, but then you will have to choose between the cells with a get(int index) and then call getTimestamp()
result_foo.rawCells()(0).getTimestamp
is a good style