HBase get multiple distinct rowids?

HBase get multiple distinct rowids? - java

I am using HBase JAVA/Scala API and have a list of rowids List(1,2,5) and want to retrieve rows corresponding to those ids at the same time.
val getOne = new Get(Bytes.toBytes("1"))
Let's me access only 1 row and I don't want to do get(1) get(2) and get(5) sequentially (latency).
How would I do all at once and iterate through the return set later?
If API does not offer it what is the next best way?

So the answer is easy
list of gets
Result[] get(List gets) throws IOException

You just have to construct multiple "Get" requests and use the below method from the HTable.
Result[] get(List<Get> gets)
Link to the javadoc is here.

Related

Efficiently copy large timeseries results in Java

I am querying data from a timeseries database (Influx in my case) using Java.
I have approximately 20.000-100.000 values (Strings) in the database.
Mapping the results that I get via the Influx Java API to my Domain Objects seems to be very inefficient (ca.0,5s on a small machine).
I suppose this is due to "resource intensive" object creation of the domain objects.
I am currently using StreamsAPI:
QueryResult series = result.getResults().get(0).getSeries().get(0);
List<ItemHistoryEntity> mappedList = series.getValues().stream().parallel().map(valueList ->
new ItemHistoryEntity(valueList)).collect(Collectors.toList());
Unfortunately, I downsampling my data at the database is not an option in my case.
How can I do this more efficiently in Java?
EDIT:
Next thing I will do with the list is downsampling. The problem is that for further downsampling, I need the oldest timestamp in the list. To get this timestamp, I need to iterate the full list. Would it be more efficient, to never call Collectors.toList() until I have reduced the size of the list, even though I need to iterate it at least twice. Or should I find the oldest timestamp using an additional db query and then iterate the list only once and call the Collector only for the reduce list?

How to query DynamoDB and return the first matching result instead of querying the whole collection?

We have some old code that is doing a query of the DynamoDB to find list of matching records.
Sample code below:
final DynamoDBQueryExpression<MyObject> queryExp = new DynamoDBQueryExpression<MyObject>()
.withHashKeyValues(myObject)
.withIndexName(indexName)
.withScanIndexForward(false)
.withConsistentRead(true)
.withLimit(rowsPerPage);
final PaginatedQueryList<MyObject> ruleInstanceList = dynamoDBMapper.query(MyObject.class, queryExp);
This is a slow operation since this query will return a list of matching MyObject, and I noticed all we used it for is to check if this list is empty or not.
So what I want to do is simply doing the query to find the first element or even a different type of query to simply make sure the count is greater than 0, all I need to verify is that the record exist so that I can reduce the latency.
My question is, how do I do it in order to achieve this?

The documentation for getLimit() indicates:
Note that when calling DynamoDBMapper.query, multiple requests are made to DynamoDB if needed to retrieve the entire result set. Setting this will limit the number of items retrieved by each request, NOT the total number of results that will be retrieved. Use DynamoDBMapper.queryPage to retrieve a single page of items from DynamoDB.
To limit the number of results, you can use queryPage() instead of query(). And apply withLimit(1) to your query expression.

DynamoDBMapper : How to get all the rows for multiple id's(array) in a single query or scan of DynamoDBMapper

My DB table consist of multiple rows whose id are unique.
API(Endpoint) -> get the rows for the id's
i am passing array of inputs (id1,id2,id3,id4)
Question : In DynamoDBMapper, write a single query fetching all the rows for the id's that we passed in.
we can use either scan or query.
Appreciate your help.
Thanks in Advance.

Scan or Query is not suitable for this transaction.
You should iterate your list and use GetItem to retrieve each item individually, which is the fastest and cheapest way to get the items. You can also use BatchGetItem if you wish to perform concurrent requests.
A Scan would be slow and expensive as it would evaluate every single item in your table. However if you insist on using it, simply scan your table and provide a ScanFilter to return your items.
If you used a Query, it would operate in exactly the same way as GetItem anyway. You would have to iterate your list of IDs. i.e. a Query is not at all suitable in this case.

I achieved in single query call (dynamoDBMapper.SCAN). Example as follow
private List<Activity> getbyIds(List<UUID> Ids) {
List<Entity> activityEntityList = new ArrayList<Entity>();
List<AttributeValue> attList = Ids.stream().map(x -> new AttributeValue(x.toString())).collect(Collectors.toList());
DynamoDBScanExpression dynamoDBScanExpression = new DynamoDBScanExpression()
.withFilterConditionEntry("id", new Condition()
.withComparisonOperator(ComparisonOperator.IN)
.withAttributeValueList(attList));
PaginatedScanList<Entity> list = dynamoDBMapper.scan(Entity.class, dynamoDBScanExpression);
}

Bulk read Couchbase documents

I want to asynchronously read a number of documents from a Couchbase bucket. This is my code:
JsonDocument student = bucketStudent.get(studentID);
The problem is for a large data file with a lot of studentIDs, it would take a long time to get all documents for these studentIDs because the get() method is called for each studentID. Is it possible to have a list of studentIDs as input and return an output of a list of students instead of getting a single document for each studentID?

If you are running a query node, you can use N1QL for this. Your query would look like this:
SELECT * FROM myBucket USE KEYS ["key1", "key2", "key3"]
In practice you would probably pass in the array of strings as a parameter, like this:
SELECT * FROM myBucket USE KEYS ?
You will need a primary index for your bucket, or queries like this won't work.

AFAIK couchbase SDK does not have a native function for a bulk get operation.
The node.js SDK has a getMulti method, but it's basically an iteration over an array and then get() is fired for each element.
I've found in my applications that the key-value approach is still faster than the SELECT * on a primary index but the N1QL query is remarkably close (on couchbase 5.x).
Just a quick tip: if you have A LOT of ids to fetch and you decide to go with the N1QL queries, try to split that list in smaller chunks. It's speeds up the queries and you can actually manage your errors better and avoid getting some nasty timeouts.

Retrieving multiple documents using the document IDs is not supported by default in the Couchbase Java SDK. To achieve that you'll need to use a N1QL query as below
SELECT S.* FROM Student S USE KEYS ["StudentID1", "StudentID2", "StudentID3"]
which would return an array of Documents with the given IDs. Construct the query with com.couchbase.client.java.query.N1qlQuery, and use either of below to execute
If you're using Spring's CouchbaseTemplate, you can use the below
List<T> findByN1QL(com.couchbase.client.java.query.N1qlQuery n1ql,
Class<T> entityClass)
If you're using Couchbase's Java SDK, you can use the below
N1qlQueryResult query(N1qlQuery query)
Note that you'll need an index on your bucket to run N1QL queries.

It is possible now. The java sdk gives capability to do multi get. It is present in 2 flavours
Async bulk get
N1Q1 query. Does not work with binary documents
The Couchbase document suggests to use Async bulk get, But that has additional dependency on reactive java client. you can see official documentation here.
There are several tutorial explaining the usage. link .
Here is how a sample get would look like, with java sdk 3.2
List<String> docsToFetch = Arrays.asList("airline_112", "airline_1191", "airline_1203");
Map<String, GetResult> successfulResults = new ConcurrentHashMap<>();
Map<String, Throwable> erroredResults = new ConcurrentHashMap<>();
Flux.fromIterable(docsToFetch).flatMap(key -> reactiveCollection.get(key).onErrorResume(e -> {
erroredResults.put(key, e);
return Mono.empty();
}).doOnNext(getResult -> successfulResults.put(key, getResult))).last().block();
source .

Unknown record number while executing SQLite query

When I am executing a SQLite query (using sqlite4java) I am following the general scheme of executing it step by step and obtaining one row at a time. My final result should be a 2-D array, whose length should correspond to the amount of records. The problem I am facing is the fact that I don't know in advance how many records are to be returned by my query. so I basically store them in an ArrayList and then copy pointers to the actual array. Is there a technique to somehow obtain the number of records to be returned by the query prior to executing it fully?

My final result should be a 2-D array, whose length should correspond to the amount of records.
Why? It would generally be a better idea to make the result a List<E> where E is some custom type representing "a record".
It sounds like you're already creating an ArrayList - so why do you need an actual array? The Collections API is generally more flexible and convenient than using arrays directly.

No, if using JDBC. You can, however, first do a COUNT() query, and then the real query.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

HBase get multiple distinct rowids? - java

So the answer is easy list of gets Result[] get(List gets) throws IOException

You just have to construct multiple "Get" requests and use the below method from the HTable. Result[] get(List<Get> gets) Link to the javadoc is here.

Related

Efficiently copy large timeseries results in Java

How to query DynamoDB and return the first matching result instead of querying the whole collection?

DynamoDBMapper : How to get all the rows for multiple id's(array) in a single query or scan of DynamoDBMapper

Bulk read Couchbase documents

Unknown record number while executing SQLite query

Categories

Resources