Friday, March 27, 2020

Basic DynamoDB Data Modelling




The key differentiator between a NoSQL database like AWS Dynamodb and a relational database like Oracle is of the way data is modelled. In a relational model; data is normalized, de-duplicated and relationships are established between entities. In a NoSQL database like Dynamodb, data can be duplicated and entities can be fused together. This is done to facilitate scalability which is hallmark of today's hyper-scale cloud based applications.


One of the most common relationship within data modelling is One-to-Many. For example, we have 2 entities; Painting and Gallery. A painting can only reside in one gallery, whereas a gallery can have multiple paintings. This is one to many relationship. In relational modelling world, we would have 2 tables; Painting and Gallery, such as:


Museum
Painting

MuseumId
MuseumName
PaintingId
PaintingName
MuseumId
M1
Sofía
P1
Guernica
M1
M2
NSW Gallery
P2
The Great Masturbator
M1
M3
Louvre
P3
Sofala
M2
P4
Mona Lisa
M3


In above One-to-Many relationship, we have joined both Museum and Painting with primary key and foreign key relationship. So e.g. if want to check all paintings in Museum M1, then the query would be:

select p.Title, m.MuseumName from Painting p, Museum m where p.MuseumId=m.MuseumId and m.MuseumId=’M1’;

Now joins are expensive, and they become more expensive as the data grows. In huge cloud scale databases this hampers scalability and at times become impractical to have these joins, hence the NoSQL databases like Dynamodb.

So how do you model One-to-Many relationship in a Dynamodb table when we cannot have joins and aggregations?

We know what the entities are, and we have an access pattern, so now let’s design our table.


MuseumPaintingTable
Primary Key
Attributes
PK
SK
MuseumName
Title
Artist
Museum#M1
Painting#P1
Sofía
Guernica
Picasso
Painting#P2
Sofía
The Great Masturbator
Salvador Dali
Museum#M2
Painting#P3
NSW Gallery
Sofala
Russel Drysdale
Museum#M3
Painting#P4
Louvre
Mona Lisa
Leonardo


If you are coming from a relational world, the above table may seem like blasphemy even though I haven’t yet used different attributes for each item to keep it simple. This is where you need to appreciate the flexibility of NoSQL databases to facilitate scalability.

In above table, we have define a composite primary key, consisting of a partition key and a sort key. A primary key in Dynamodb table defines a unique item, where item corresponds to a row. Remember that all access patterns are driven through the primary key, which means that the entry point for your queries must be the primary key. Every museum and painting has its own unique id, so we have combined them together to generate a unique item.

Note that how we have defined a one-to-many relationship in above table using partition key and sort key within the primary key. For one Museum#M1, we have 2 paintings; Painting#P1 and Painting#P2, and then both of these paintings have their own attributes, which can even differ in structure. We have fused both Painting and Museum entities together.

Now how would you write the previous relational query in DynamoDB, where we wanted to check all paintings in the Museum M1? It would be like this:

aws dynamodb query \
    --table-name MuseumPaintingTable \
    --key-condition-expression "PK = :MID" \
    --expression-attribute-values  '{":MID":{"S":"M1"}}'
               --projection-expression 'Title,MuseumName'

The above command is a RESTful API call to Dynamodb table.

No comments: