Linking data in java microservices in a database per service implementation

Linking data in java microservices in a database per service implementation - java

I am building a java / spring microservices where each service has it own database . Let's say i have a user service that stores user information in one of the table and a orders service that stores only the username of the person who orders as described below :-
User Service (UserService Database - User Table )
id firstName lastName username age
1 Chris Brown c.brown 20
2 John Doe j.doe 25
And orders service as below
Order Service (OrderService Database - Order Table )
id username productName productPrice OrderDate
1 c.brown Sony Mic 100$ 20-08-2018
2 j.doe Television j.doe 11-07-2018
Question is what is the best approach to get firstName and lastName from user service while listing the orders . I am aware that microservices should communicate via Rest API , but if i have 1000 users with orders , i will have to loop 1000 times to get the firstName and lastName or take usernames as array , activity which might be expensive .
I have read on using CQRS and event sourcing , but not sure how to best apply it in this scenario .

If you want to build a scalable and resilient application your microservices should not make synchronous calls from one to another (you can read The Art of Scalability book).
This means that when a microservice receive a request from its clients it should have all the data already gathered in its local storage. In your case, you have two possibilities:
add the firstName, lastName columns to the Orders table
create another table with users having id, firstName, lastName columns and make a join when returning data to the clients.
To keep the replicated information eventually consistent with the source (the Users service) you can use one of the following technics:
have a cron job that fetches all the needed user information and replaces all the firstName, lastName columns.
use integration events; in CQRS/Event sourcing architectures you already have the Domain events - you can subscribe to those. If you don't have CQRS but a plain architecture, then you can add triggers to your database that pushes low level mutation events (row created/updated/deleted) to the subscribed services. For more options you can read Migrating to Microservice Databases book by Edson Yanaga

Order Service can keep a shadow copy of limited user information (first name and last name in your case) in its database (using event sourcing) and can build Order object with limited user information all by itself.

Ideally, there should be a REST call to get the data from other MS.
But if these calls are very expensive you should consider changing the DB design and put this required data at a commonplace.

Related

Applying filter and paginaton on different micro services

There are two microservices. One for Account service, other for customer service.
Account service stores the data such as account id, customer id, account balance, interest earned and other monetary stuff.
Customer service has personal data such as an customer id, home address, email id, the frequency for sending emails/alerts, Customer-type(Premium, Gold, Silver)... etc.
The account has customer id as the foreign key to map the account and customer data.
The requirement is such that I need to build a search UI memory-efficient. Fetch all the accounts which belong to a particular customer type (premium/gold) and have a balance greater than 200$. So if you see there is cross join filter ie filter needs to be applied on both the microservices to get the desired data. Also, only 10 records would be fetched (page size).On click on the next button, the next 10 records would be fetched.
So how do I fetch the records/store data for this scenario? This is a sort of design /architecture question. I am not expecting the query syntax or sql syntax or hibernate(ORM) syntax.
Should I fetch all the records at the same time, keep it in memory and then apply filters along with pagination. Or Should I use some batching?
Also, I won't prefer SQL joins. This create dependency. If database tables get altered, the query will get impacted

Retrieve information for the same DTO from two different databases

I tried to make this as simple as possible with a short example.
We have two databases, one in MSSQLServer and other in Progress.
We have the user DTO as it follows that we shown in a UI table within a web application.
User
int, id
String, name
String, accountNumber
String, street
String, city
String, country
Now this DTO(Entity) is not stored only in one database, some information (fields) for the same user are stored in one database and some in the other database.
MSsql
Table user
int, id
String, name
String, accountNumber
Table userModel
int, id
String, street
String, city
String, country
As you can see the key is the only piece that link two tables in both databases, as I said before they are not in the same database and not using same database vendor.
We have a requirement for sorting the UI table for each column. Obviously we need to create user dto with the information coming from both databases.
Our proposal at this moment is if user want to apply sorting using street field, we run a query in the Progress database and obtain a page (using pagination) using this resultset and go directly to the MSSQLServer User table with those keys and run another query to extract the missing information and save it to our DTO and transfer it to the UI. With implies run a query in one database then other query based on the returned keys in the second database.
The order of the database could change depending in which column(field) the user wants to apply sorting.
Technically we will create a jparepository that acts as a facade and depending on the field make the process in the correct database.
My question is:
There is some kind of pattern that is commonly used in this scenarios, we are using spring, so probably spring have some out of the box features to support this requirement, will be great if this is possible using jparepositories (I have several doubts about it as we will use two different entitymanagers, one for each database).
Note: Move data from one database to another is not an option.

For this, you need to have separate DataSource/EntityManagerFactory/JpaRepository.
There is no out-of-the-box support for this architecture in the Spring framework, but you can easily hide the dual DataSource pair behind a Service layer. You can even configure JTA DataSources for ACID operations.

As you will always need to fetch data from both databases, why not populate local java User objects then sort these objects (using a comparator with the appropriate fields you want to sort on).
The advantage of sorting locally vs doing the sort in the database query is that you won't have to send requests to the database every time you change the sorting field.
So, to summarize:
1- Issue two sql queries for the two databases to get your users
2- Build your User objects using the retrieved values
3- Use Java comparators to sort the users on any field without having to issue new queries to the database.

My advice would be to find a way to link 2 databases together so that you can utilize database driver features without your code being affected.
Essentially if Progress database can be linked to SQL Server, you will be able to query both databases using a single SQL query with a join on id column and you will get a merged, sorted and paginated result set for your application to display.
I am not an expert in Progress database but it seems there is an ODBC driver for it so you might try to link it to SQL Server.

Is it reasonable to use the same parent key for all users in Google App Engine Datastore to transact writes?

I am reading about GAE and its datastore. I came across this question and article. So I wonder if my users can be identified, say, by email, would it be reasonable to use the same parent for all users and email as a key with the goal of resolving conflicts when two different users are trying to use the same email as their identifiers? In theory if number of users becomes large (like, say, 10M), may it cause any issues? From my perspective, gets should be just fine but puts are those that are locked. So if gets significantly dominate puts (which happen really only at the point of creating a new user), I don't see any issues. But....
Key parent = KeyFactory.createKey("parent", "users");
Key user = KeyFactory.createKey(parent, "user", "user#domain.com");
When to use entity groups in GAE's Datastore
https://developers.google.com/appengine/articles/scaling/contention

I also faced the unique email issue and here's what I've done:
Setup a "kind" called "Email" and use the user inputted email as string key. This is the only way you can make a field scale-able and unique in datastore. Then setup another kind called "User" and have the Key using auto generated Id:
Email
key: email,
UserKey: datastore.Key
User
key: auto_id,
Password: string,
Name: string
In this setup, the email can be used as login, and user have the option to change their email as well (or have multiple emails), while email remains unique system-wide.)
====================
It's not scale-able if you put every user under the same parent. You will end up with all data stuck on one particular "server" because entities from the same entity group are stored in close proximity. You will end up facing the 5 writes per second issue
=====================
As a general rule of thumb, things that scales (e.g. user), must be a root entity to enjoy the benefit of data-store scale-ability.

I think I have found the answer to my question. https://developers.google.com/appengine/articles/handling_datastore_errors in Causes of Errors section:
The first type of timeout occurs when you attempt to write to a single entity group too quickly. Writes to a single entity group are serialized by the App Engine datastore, and thus there's a limit on how quickly you can update one entity group. In general, this works out to somewhere between 1 and 5 updates per second; a good guideline is that you should consider rearchitecting if you expect an entity group to have to sustain more than one update per second for an extended period. Recall that an entity group is a set of entities with the same ancestor—thus, an entity with no children is its own entity group, and this limitation applies to writes to individual entities, too. For details on how to avoid datastore contention, see Avoiding datastore contention. Timeout errors that occur during a transaction will be raised as a appengine.ext.db.TransactionFailedError instead of a Timeout.

New to databases, how do I design this? MYSQL and Java

I am new to databases and before I start learning mySQL and using a driver in Java to connect with my database on the server-side, I wanted to get the design of my database down first. I have two columns in the database, CRN NUMBER, and DEVICE_TOKEN. The CRN number will be a string of five digits, and the DEVICE_TOKEN will be a string device token(iOS device token for push notifications). Let me try to describe what I am trying to do. I am going to have users send my server data from the iOS app, mainly their device token for push notifications and a CRN(course) they want to watch. There are going to be MANY device tokens requesting to watch the same CRN number. I wanted to know the most efficient way to store these in a database. I am going to have one thread looking through all of the rows in the DB, and polling the website for each CRN. If the event I am looking for takes place, I want to notify every device token associated with this CRN. Initially, I wanted to have one column being the CRN, and the other column being DEVICE_TOKENS. I have learned though that this is not possible, and that each column should only correspond to one entry. Can someone help me figure out the best way to design this database, that would be the most efficient?
CRN DEVICE_TOKEN
12345 "string_of_correct_size"
12345 "another_device_token"
Instead of me making multiple request to the website for the same CRN, it would be MUCH more efficient for me to poll the website per unique CRN ONCE per iteration, and then notify all device tokens of the change. How should I store this information? Thanks for your time

In this type of problem where you have a one-to-many relationship (one CRN with many Device_Tokens), you want to have a separate table to store the CRN where a unique ID is assigned for each new CRN. A separate table should then be made for your DEVICE_TOKENS that relates has columns for a unique ID, CRN, and DEVICE_TOKEN.
With this schema, you can go through the rows of the CRN table, poll against each CRN, and then just do a simple JOIN with the DEVICE_TOKEN table to find all subscribed devices if a change occurs.

The most normal way to do this would be to normalize out the Courses with a foreign key from the device tokens. E.g. Two tables:
Courses
id CRN
1 12345
InterestedDevices
id course DEVICE_TOKEN
1 1 "string_of_correct_size"
2 1 "another_device_token"
You can then find interested devices with a SQL like the following:
SELECT *
FROM Courses
JOIN InterestedDevices ON Courses.id = InterestedDevices.course
WHERE Courses.CRN = ?
This way you avoid duplicating the course information over and over.

Understanding Google App Engine datastore

i am in the early stages of designing a VERY large system (its an enterprise level point of sale system). as some of you know the data models on these things can get very complicated. i want to run this thing on google app engine because i want to put more of my resources to developing the software rather than building and maintaining an infrastructure.
in that spirit of things, ive been doing a lot of reading on GAE and DataStore. im an old school relational database modeler and ive seen several different concepts of what a schemaless database is and i think ive figured out what datastore is but i want to make sure i have it right
so, if im right gae is a sorta table based system. so if i create a java entity
class user
public string firstname
public string lastname
and deploy it, the "table" user is automatically created and running. then in subsquent releases if i modify class user
class user
public string firstname
public string lastname
public date addDate
and deploy it, the "table" user is automatically updated with the new field.
now, in relating data, as i understand it, its very similar to some of the massively complex systems like SAP where the data is in fact very organized, but due to the volume its referential integrity is a function of the application, not the database engine. so i would have code that looks like this
class user
public long id
public string firstname
public string lastname
class phone
public string phonenumber
public user userentity
and to pull up the phone numbers for a user from scratch instead of
select phone from phone inner join user as phone.userentity = user where user.id = 5
(lay off i know the syntax is incorrect but you get the point)
i would do something like
select user from user where user.id = 5
then
select phone from phone where phone.userentity = user
and that would retrieve all the phone numbers for the user.
so, as i understand, its not so much a huge change in how to think about structuring data and organizing data, as its a big change on how to access it. i do joins manually with code instead of joins automatically with the database engine. beyond that its the same. am i correct or am i clueless.

There are really no tables at all. If you make some users with only a first and last name, and then later add addDate, then your original entities will still not have an addDate property. None of the user entities are connected at all, in any way. They are not in a table of Users.
You can access all of the objects you wrote to the database that have the name "User" because appengine keeps big, long lists (indexes) of all of the objects that have each name. So, any object you put in there that has the name (kind) "User" will get an entry in this list. Later, you can read that index to get the location of each of your objects, and use those locations (keys) to fetch the objects. They are not in a table, they're just floating around. Some of them have some properties in common, but this is a coincidence, and not a requirement.
If you want to fetch all of the User objects that have a certain name (Select * from User where firstname="Joe") then you have to maintain another big long index of keys. This index has the firstname property as well as the key of an entity on each row. Later you can scan the index for a certain firstname, get all the keys, and then go look up the actual entities you stored with those keys. All of THOSE entities will have the firstname property (because you wouldn't enter an entity without the firstname property on your firstname index), but they may not have any other fields in common, because they are not in a table that enforces any data structure at all.
These complications affect the way data is accessed pretty dramatically, and really affect things like transactions and complex queries. You're basically right that you don't have to change your thinking too much, but you should definitely understand how indexes and transactions work before planning your data structures. It is not always simple to efficiently tack on extra queries that you didn't think of before you got started, and it's pretty expensive to maintain these indexes, so the fewer you can get by with the better.

Great introduction to Google datastore is written by the creator of objectify framework: Fundamental Concepts of the Datastore

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.