Efficient use of the GAE DataStore

Efficient use of the GAE DataStore - java

I am currently developing a Google AppEngine (GAE) application and I am struggling a bit with the GAE DataStore best practices. I would like to use the DataStore in the most efficient way. I am using the Objectify framework, but am flexible to use something else if there is a better alternative.
My application uses three objects/tables:
- Items (id, description)
- List (id, listId, listDescription
- SecurityProfile (id,listId, username, accessType)
I an relational world, my Items and SecurityProfiles tables would have an external key to link them to a list (ListId) and I would then use joins in my queries.
The typical Queries I need to make:
- Get all lists accessible to a particular user (need an index on "username" to filter by username and need to get the description from the List table)
- Get all items in list for a particular user (get the Items linked to the Lists retrieved in the query above)
I am struggling a bit to come up with a way to link the different objects in an efficient way (minimizing the DataStore queries and indexes).
I have seen in other posts that joins should be avoided and that I should de-normalize the model as much as possible.
So kind of creating one object only:
- Data (id, description, listId, listDescription, username, accessType)
I can see how that work from a read point of view, but if I update a listDescription, an accessType or add a new username, I could potentially have to update a massive amount of records. Is this really the way to go ?

I'm only familiar with the Python NDB API, but things are similar in Java.
In Python NDB, I would recommend to create a Model for each
User,
List,
List item
Then, you can reference them with repeated KeyProperties, e.g.
class SecurityProfiles(ndb.Model):
accessibleLists = ndb.KeyProperty(repeated=true)
class List(ndb.Model):
listItems = ndb.KeyProperty(repeated=true)
Like this, you can pull a user's profile from the DataStore, and with the keys stored in accessibleLists you can get the lists accessible to the user.
Alternatively, you could do it the other way around:
class List(ndb.Model):
usersWithAccess = ndb.KeyProperty(repeated=true)
and then you could immediately query for lists that are accessible to a given user.

Related

How to implement one-to-many relationships with DynamoDB

everyone! I have two tables that I would like to join via DynamoDb, but since the latter is not a relational db, I don't know how to map the link between the two tables.
In particular, I have a Price List table and a Detail List table that contains the details of the first one. How can I implement one-to-many relationship in java using dynamoDB with Spring Boot?

DynamoDB is basically a key-value store. You only every perform a lookup based on a key. That key may be artificial, not just a user id, but maybe "user_id#product#order" but still it will be a key-based lookup. If you want to use DynamoDB you have to store the data in a way that all queries that you will need will all boil down to basic key-based access (plus some sorting).
You have to do the exact opposite of normalizing your data and splitting relations into multiple tables: you have to de-normalize all your data to store the data and all the relations just in one table, multiple times, with multiple complex artificial keys. See e.g. https://www.youtube.com/watch?v=HaEPXoXVf2k on how to use LSIs, GSIs, how to model your data, how to choose artificial keys, etc.
That means you will not have Item, Order and OrderItem table that you join together, but you will have just one Everything table which may have the fields: userid, username, ordernumber, itemid, itemprice, itemquantity, itemname, orderdate, shippingaddress, etc.
And if you have three items in an order you will have three entries in this table. That means the username will be in the table very often, that means the itemname will be in the table very often and changing them will be difficult but that is how things are if you want to use dynamodb.
That is how you model one-to-many relations, by packing them into a single table and add proper indexes.
If you do have no idea about the current or future access patterns of your data or how to structure your data properly then dynamodb is the wrong tool for you.

The question you are asking gets at the very essence of working with DynamoDB and NoSQL data modeling. It is not as simple as applying your relational database knowledge to DynamoDB. Take a moment to familiarize yourself with the DynamoDB basics before you get too far into solving this problem.
Watch this video about modeling one-to-many relationships in DynamoDB. I would recommend you watch the entire video from the beginning, as it's one of the best introductions to the topic currently available.

Retrieve information for the same DTO from two different databases

I tried to make this as simple as possible with a short example.
We have two databases, one in MSSQLServer and other in Progress.
We have the user DTO as it follows that we shown in a UI table within a web application.
User
int, id
String, name
String, accountNumber
String, street
String, city
String, country
Now this DTO(Entity) is not stored only in one database, some information (fields) for the same user are stored in one database and some in the other database.
MSsql
Table user
int, id
String, name
String, accountNumber
Table userModel
int, id
String, street
String, city
String, country
As you can see the key is the only piece that link two tables in both databases, as I said before they are not in the same database and not using same database vendor.
We have a requirement for sorting the UI table for each column. Obviously we need to create user dto with the information coming from both databases.
Our proposal at this moment is if user want to apply sorting using street field, we run a query in the Progress database and obtain a page (using pagination) using this resultset and go directly to the MSSQLServer User table with those keys and run another query to extract the missing information and save it to our DTO and transfer it to the UI. With implies run a query in one database then other query based on the returned keys in the second database.
The order of the database could change depending in which column(field) the user wants to apply sorting.
Technically we will create a jparepository that acts as a facade and depending on the field make the process in the correct database.
My question is:
There is some kind of pattern that is commonly used in this scenarios, we are using spring, so probably spring have some out of the box features to support this requirement, will be great if this is possible using jparepositories (I have several doubts about it as we will use two different entitymanagers, one for each database).
Note: Move data from one database to another is not an option.

For this, you need to have separate DataSource/EntityManagerFactory/JpaRepository.
There is no out-of-the-box support for this architecture in the Spring framework, but you can easily hide the dual DataSource pair behind a Service layer. You can even configure JTA DataSources for ACID operations.

As you will always need to fetch data from both databases, why not populate local java User objects then sort these objects (using a comparator with the appropriate fields you want to sort on).
The advantage of sorting locally vs doing the sort in the database query is that you won't have to send requests to the database every time you change the sorting field.
So, to summarize:
1- Issue two sql queries for the two databases to get your users
2- Build your User objects using the retrieved values
3- Use Java comparators to sort the users on any field without having to issue new queries to the database.

My advice would be to find a way to link 2 databases together so that you can utilize database driver features without your code being affected.
Essentially if Progress database can be linked to SQL Server, you will be able to query both databases using a single SQL query with a join on id column and you will get a merged, sorted and paginated result set for your application to display.
I am not an expert in Progress database but it seems there is an ODBC driver for it so you might try to link it to SQL Server.

Java Search by multiple keywords

I have a list of records containing country, city, district and building name information (more than 50,000 records) where building name is unique for every record.
I want to search building, district & city. But I want to get a list of cities if I pass the country to a method, e.g. get(String country). Or, get a list of districts if I pass country and city to the method, e.g. get(String country, String city).
Is there any existing collection/library/data structure to do something like this? I am thinking of a tree-like structure / Map. I tried MultiKeyMap, but it does not return a list of values and it is not thread-safe. Also, I don't want to use database for doing this.
Thanks in advance for your help.

SolR might do the job you are after:
Solr is the popular, blazing fast open source enterprise search
platform from the Apache Lucene project. Its major features include
powerful full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search. Solr is highly scalable, providing
distributed search and index replication, and it powers the search and
navigation features of many of the world's largest internet sites...
It should allow you to create queries which will in turn allow you to search through your records.
You can also interact with SolR through Solrj:
Solrj is a java client to access solr. It offers a java interface to
add, update, and query the solr index.

You can use HashMap like
HashMap<country,HashMap<City,HashMap<district,HashMap<building,value>>>>

You could take a look at Apache's Commons CollectionUtils. It has a "select" method that do what you want.

An off-beat type of way maybe using .properties files for each country to refer to a subset of localities in their each own .properties that again contains a a .properties to refer to cities that refer to .properties file containing buildings.
Another may be a class hierarchy system with a base instantiated "new" class e.g. GeographicLocation with a constructor that is fed an index to load an abstract class that indicates a Region or brings back a list of regions if not indicated by calling one of the two methods overloaded and that in turn automatically loads the next abstract class layer of city over the top of that.
Inside GeographicLocation class ....
CountryMap cntry = (CountryMap)this();
RegionMap rgion = (RegionMap)cntry;
CityMap cty = (CityMap)rgion;
....e.t.c.

Why not simply use three hashtables (e.g. of the type HashMap<String, List<Record>>): one keyed by buildings, one keyed by city and one keyed by district. Sure, you'll be using about three times as much memory; but 50,000 records really isn't that much. Furthermore, lookups will be really fast and simple. I'd recommend trying this and seeing how it performs.

Understanding Google App Engine datastore

i am in the early stages of designing a VERY large system (its an enterprise level point of sale system). as some of you know the data models on these things can get very complicated. i want to run this thing on google app engine because i want to put more of my resources to developing the software rather than building and maintaining an infrastructure.
in that spirit of things, ive been doing a lot of reading on GAE and DataStore. im an old school relational database modeler and ive seen several different concepts of what a schemaless database is and i think ive figured out what datastore is but i want to make sure i have it right
so, if im right gae is a sorta table based system. so if i create a java entity
class user
public string firstname
public string lastname
and deploy it, the "table" user is automatically created and running. then in subsquent releases if i modify class user
class user
public string firstname
public string lastname
public date addDate
and deploy it, the "table" user is automatically updated with the new field.
now, in relating data, as i understand it, its very similar to some of the massively complex systems like SAP where the data is in fact very organized, but due to the volume its referential integrity is a function of the application, not the database engine. so i would have code that looks like this
class user
public long id
public string firstname
public string lastname
class phone
public string phonenumber
public user userentity
and to pull up the phone numbers for a user from scratch instead of
select phone from phone inner join user as phone.userentity = user where user.id = 5
(lay off i know the syntax is incorrect but you get the point)
i would do something like
select user from user where user.id = 5
then
select phone from phone where phone.userentity = user
and that would retrieve all the phone numbers for the user.
so, as i understand, its not so much a huge change in how to think about structuring data and organizing data, as its a big change on how to access it. i do joins manually with code instead of joins automatically with the database engine. beyond that its the same. am i correct or am i clueless.

There are really no tables at all. If you make some users with only a first and last name, and then later add addDate, then your original entities will still not have an addDate property. None of the user entities are connected at all, in any way. They are not in a table of Users.
You can access all of the objects you wrote to the database that have the name "User" because appengine keeps big, long lists (indexes) of all of the objects that have each name. So, any object you put in there that has the name (kind) "User" will get an entry in this list. Later, you can read that index to get the location of each of your objects, and use those locations (keys) to fetch the objects. They are not in a table, they're just floating around. Some of them have some properties in common, but this is a coincidence, and not a requirement.
If you want to fetch all of the User objects that have a certain name (Select * from User where firstname="Joe") then you have to maintain another big long index of keys. This index has the firstname property as well as the key of an entity on each row. Later you can scan the index for a certain firstname, get all the keys, and then go look up the actual entities you stored with those keys. All of THOSE entities will have the firstname property (because you wouldn't enter an entity without the firstname property on your firstname index), but they may not have any other fields in common, because they are not in a table that enforces any data structure at all.
These complications affect the way data is accessed pretty dramatically, and really affect things like transactions and complex queries. You're basically right that you don't have to change your thinking too much, but you should definitely understand how indexes and transactions work before planning your data structures. It is not always simple to efficiently tack on extra queries that you didn't think of before you got started, and it's pretty expensive to maintain these indexes, so the fewer you can get by with the better.

Great introduction to Google datastore is written by the creator of objectify framework: Fundamental Concepts of the Datastore

persisting dynamic properties and query

I have a requirement to implement contact database. This contact database is special in a way that user should be able to dynamically (on runtime) add properties he/she wants to track about the contact. Some of these properties are of type string, other numbers and dates. Some of the properties have pre-defined values, others are free fields etc.. User wants to be also able to query such structure fast and easily. The database needs to handle easily 500 000 contacts each having around 10 properties.
It leads to dynamic property model having Contact class with dynamic properties.
class Contact{
private Map<DynamicProperty, Collection<DynamicValue> values> propertiesAndValues;
//other userfull methods
}
The question is how can I store such a structure in "some database" - it does not have to be RDBMS so that I can easily express queries such as
Get all contacts whose name starts with Martin, they are from Company of size 5000 or less, order by time when this contact was inserted in a database, only first 100 results (provide pagination), where each of these segments correspond to a dynamic property.
I need:
filtering - equal, partial equal, (bigger, smaller for integers, dates) and maybe aggregation - but it is not necessary at this point
sorting
pagination
I was considering RDBMS, but this leads more less to this structure which is quite hard to query and it tends to be slow for this amount of data
contact(id serial pk,....);
dynamic_property(dp_id serial pk, ...);
--only one of the values is not empty
dynamic_property_value(dpv_id serial pk, dynamic_property_fk int, value_integer int, date_value timestamp, text_value text);
contact_properties(pav_id serial pk, contact_id_fk int, dynamic_propert_fk int);
property_and_its_value(pav_id_fk int, dpv_id int);
I consider following options:
store contacts in RDBMS and use Lucene for querying - is there anything that would help with this?
Store dynamic properties as XML and store it to rdbms and use xpath support - unfortunatelly it seems to be pretty slow for 500000 contacts
use another database - Mango DB or Jackrabbit to store this information
Which way would you go and why?

Wikipedia has a great entry on Entity-Attribute-Value modeling which is a data modeling technique for representing entities with arbitrary properties. It's typically used for clinical data, but might apply to your situation as well.

Have you considered using Lucene for your querying needs? You could probably get away with just using Lucene and store all your data in the index. Although I wouldn't recommend using Lucene as your only persistence store.
Alternatively, you could use Lucene along with a RDBMS and take advantage of something like Compass.

You could try other kind of databases like CouchDB which is a document oriented db and is distributed
If you want a dumb solution, for your contacts table you could add some 50 columns like STRING_COLUMN1, STRING_COLUMN2... upto 10, DATE_COLUMN1..DATE_COLUMN10. You have another DESCRIPTION column. So if a row has a name which is a string then STRING_COLUMN1 stores the value of your name and the DESCRIPTION column value would be "STRING_COLUMN1-NAME". In this case querying can be a bit tricky. I know many purists laugh at this, but I have seen a similar requirement solved this way in one of the apps :)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.