Partial JSON update on DynamoDB - java

I have a simple dynamo table that consists of cookies and attributes:
customer
cookie
attribute_1
attribute_2
...
attribute_n
Right now, these attributes are variable and need to be updated upon receiving a partial JSON through and endpoint.
I made my mind into using the new JSON type field in DynamoDB (since that's our main datastore choice), and I intend to reshape the table into:
customer
cookie
attributes
Where attributes is just a JSON document.
Main issues:
I have no way of knowing which attributes are going to be added
I have no way ok knowing which items already exist (save from making an extra query)
I'd like to avoid a super complex code to do this
Main goal:
In an ideal world, there should be some way of having or not an item in dynamo and passing the primary key along with some JSON and then having the DB partially update the existing JSON.
So far I've seen this kind of code:
DynamoDB dynamo = new DynamoDB(new AmazonDynamoDBClient(...));
Table table = dynamo.getTable("people");
table.updateItem(
new UpdateItemSpec()
.withPrimaryKey("person_id", 123)
.withUpdateExpression("SET document.current_city = :city")
.withValueMap(new ValueMap().withString(":city", "Seattle")));
But I'd like to avoid making an extra query (to know if I need to create or update) and constructing all the update expressions.
Is there a way to do this?
Here is a full example just in case:
1) Receive the following JSON in the API:
{"name": "John"}
Expected dynamo attribute:
attributes={"name": "John"}
2) Receive the following JSON in the API:
{"age": 12}
Expected dynamo attribute:
attributes={"name": "John", "age": 12}
And so on. The primary key is constructed from the request cookie / customer.
My hopes for this existing comes from the fact that dynamo supports the smart updateItem (which I'm currently using) that allows to specify only some attributes to update or create an item.

Related

Lucidworks Fusion 4.1 transform result documents using Javascript query pipeline

How can I transform solr response using JavaScript query Pipeline in Lucidworks Fusion 4.1? For example I have the following response:
[
{ "doc_type":"type1",
"publicationDate":"2018/10/10",
"sortDate":"2017/9/9"},
{ "doc_type":"type2",
"publicationDate":"2018/5/5",
"sortDate":"2017/12/12"}]
And I need to change it with the following conditions:
If doc_type = type1 then put sortDate in publicationDate and remove sortDate; else only remove sortDate
How can I manipulate with response? There is no documentation in official website
Currently, you cannot modify the Solr response. All you can do is add to it. So you could add a new block of JSON, include the "id" of the item and then list the fields and values you want to use in your UI.
Otherwise, you need to make the change in your Index Pipeline (as long as the value doesn't need to change based on the query).

kairosdb and elasticsearch integration

I'm using Kairosdb as my primary db. Now I want to integrate the Elasticsearch functionalities to my data inside Kairosdb. As stated inside the docs I have to duplicate all entries of my primary db inside Elasticsearch database.
Update
What I mean is that, if I want to index something inside elasticsearch, I have to do, for example:
Retrieve data of Kairosdb, a example json {"name": "hi","value": "6","tags"}
and then put it inside Elasticsearch:
curl -XPUT 'http://localhost:9200/firstIndex/test/1' -d '{"name": "hi","value": "6","tags"}'
If I want to search I have to do this:
curl 'http://localhost:9200/_search?q=name:hi&pretty=true'
I'm wondering if it is possible to not duplicate my data inside Elasticsearch, in a way which I can achieve this:
get data from KairosDB
index them using Elasticsearch without duplicate the data.
How can I go about that?
It sounds like you're hoping to use Elasticsearch as a secondary (and external) fulltext index for your primary datastore (KairosDB).
Since KairosDB is remaining your primary datastore, each record you load into Elasticsearch needs two pieces of information (at minimum):
The primary key field(s) for locating the corresponding KairosDB record(s). In the mapping, make sure to set "store": true, "index": "not_analyzed"
Any fields which you wish to be searchable (in your example, only name is searched) "store": false, "index": "analyzed"
If you want to reduce your index size further, consider disabling the _source field
Then your search workflow becomes a two-step process:
Query Elasticsearch for name:hi and retrieve the KairosDB primary key field(s) for each of the matching record(s).
Query/return KairosDB time-series data using key fields returned from Elasticsearch.
But to be clear. You don't need an exact duplicate of each KairosDB record loaded into Elasticsearch. Just the searchable fields, along with a means to locate the original record in KairosDB.

DynamoDB API: How can I build an "add JSON attribute if not present" update request?

I am trying to use the new Amazon DynamoDB JSON API to add/overwrite key-value pairs in a JSON attribute called "document". Ideally, I would like simply to structure my write calls to send the KV pairs to add to the attribute, and have Dynamo create the attribute if it does not already exist for the given primary key. However if I try this with just a straightforward UpdateItemSpec:
PrimaryKey primaryKey = new PrimaryKey("key_str", "mapKey");
ValueMap valuesMap = new ValueMap().withLong(":a", 1234L).withLong(":b", 1234L);
UpdateItemSpec updateSpec = new UpdateItemSpec().withPrimaryKey(primaryKey).withUpdateExpression("SET document.value1 = :a, document.value2 = :b");
updateSpec.withValueMap(valuesMap);
table.updateItem(updateSpec);
I get com.amazonaws.AmazonServiceException: The document path provided in the update expression is invalid for update, meaning DynamoDB could not find the given attribute named "document" to which to apply the update.
I managed to approximate this functionality with the following series of calls:
try {
// 1. Attempt UpdateItemSpec as if attribute already exists
} catch (AmazonServiceException e) {
// 2. Confirm the exception indicated the attribute was not present, otherwise rethrow it
// 3. Use a put-if-absent request to initialize an empty JSON map at the attribute "document"
// 4. Rerun the UpdateItemSpec call from the above try block
}
This works, but is less than ideal as it will require 3 calls to DynamoDB every time I add a new primary key to the table. I experimented a bit with the attribute_not_exists function that can be used in Update Expressions, but wasn't able to get it to work in the way I want.
Any Dynamo gurus out there have any ideas on how/whether this can be done?
I received an answer from Amazon Support that it is not actually possible to accomplish this with a single call. They did suggest to reduce the number of calls when adding the attribute for a new primary key from 3 to 2, by using the desired JSON map in the put-if-absent request rather than an empty map.

How does no-sql handle relational data?

I know it is a non-relational database but this does not mean that relational data does not exist.
For example, I have a table that holds urls like this ( simplified ):
url | domain
and I have a table that holds domains like this ( simplified ):
domain | favicon_path
Because many different urls may have the same domain, I did not want to repeat the favicon_path for each domain when pulling the data for sending to the view.
Hence I used a simple ( simplified for example ) join command when I need the data.
"SELECT bookmarks.*, domains.favicon FROM bookmarks JOIN
domains ON bookmarks.domain=domains.domain"
How would I handle this scenario using no-sql?
I plan on implementing no-sql using indexedDB on the client ( javascript ) and MongoDB on the server ( java ).
If you want to use document-oriented DB, you can use this structure of documents:
URL_ID : {
"domain":"id_of_domain",
"another_staff": "..."
}
DOMAIN_ID : {
"favicon_path" : "path or id of another document",
"another_staff": "..."
}
So you can get document with URL_ID by id from database and then get document of type Domain.
ADDITION:
You can use the following approach for generating id. Create special document (like sequence) which will have only one field - current_value_of_sequence. Every insert to DB you have to get this sequence and increment it. Some DB like Couchbase have low-level support of this mechanism, which very efficient and thread-safety.
From years of work expierence in IT area, I would say most of the business models could be normalized as simple as these two types of data structure:
Entity info.
Entity list.
For example, in a book store business, we will have the Book entity, and many list that containing all of the books or a subset of the whole books.
With a NoSQL database, such as Redis or SSDB, the Book entity is stored with Key-Value, where key is the book sn, and value is the stringified book info(title, publish date, description, etc). While book list(list by publish date, list by price, etc) are stored in zset data type.

Amazon DynamoDB to Get Items with attribute value of... (Java API)

I'm fairly new to Amazon's AWS and its API for Java, so I'm not exactly sure what the most efficient method for what I'm trying to do would be. Basically, I'm trying to setup a database that will store a project's ID, it's status, as well as the bucket and location when uploaded to an S3 bucket by a user. What I'm having trouble with is getting a list of all the project IDs that have a status of "ready" under the status attribute. Any projects that are of status "ready" need to have their ID numbers loaded to an array or arraylist for later reference. Any recommendations?
The way to do this is to use the scan API. However, this means dynamo will need to look at every item in your table, and check if its attribute "status" is equal to "ready". The cost of this operation will be large, and will charge you for reading every item in your table.
The code would look something like this:
Condition scanFilterCondition = new Condition()
.withComparisonOperator(ComparisonOperator.EQ.toString())
.withAttributeValueList(new AttributeValue().withS("ready"));
Map<String, Condition> conditions = new HashMap<String, Condition>();
conditions.put("status", scanFilterCondition);
ScanRequest scanRequest = new ScanRequest()
.withTableName("MasterProductTable")
.withScanFilter(conditions);
ScanResult result = client.scan(scanRequest);
There is a way to make this better, though it requires denormalizing your data. Try keeping a second table with a hash key of "status", and a range key of "project ID". This is in addition to your existing table. This would allow you to use the Query API (scan's much cheaper cousin), and ask it for all items with a hash key of "ready". This will get you a list of the project IDs you need, and you can then get them from the project ID table you already have.
The code for this would look something like:
QueryRequest queryRequest = new QueryRequest()
.withTableName("ProductByStatus")
.withHashKeyValue(new AttributeValue().withS("ready"));
QueryResult result = client.query(queryRequest);
The downside to this approach is you have to update two tables whenever you update the status field, and you have to make sure that you keep them in sync. Dynamo doesn't offer transactionality, so you have to be ready for the case where the update to the master project table succeeds, but your secondary status table doesn't. Or vice-versa.
For further reference: http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html

Categories