Home Tags Anciens articles Mon CV

Automate your SQL & NoSQL databases with AWS Managed Services

Info : This is a new post in our serie of co-authoring articles with my mate @Zoph ;)

Introduction

With the rising of cloud managed services comes a very important one: the databases. So, what’s a managed database service ?

Basically, a database is a server-side software like MySQL, MariaDB, PostgreSQL, or for NoSQL, Redis, MongoDB, etc… But, when you install and build your database server that way, it means you have to manage the configuration, which is sometimes very tricky. You might make some mistakes that can kill performances.

Also, you have to manage the server itself, which means check the updates, provide for a rollback, have a knowledge of the OS, network, security and so on.

A managed database is somehow a Database As A Service, provided by your favorite cloud provider. Which means : you don’t have to care about the server or the Operating System itself.

Your favorite Cloud Service Provider (CSP), typically AWS, provides you with a console and an API that you can use to launch databases directly in the cloud. So you don’t care of the configuration either, because it’s automatically done by your cloud provider, depends on your cpu/ram configuration and other variables. You will still be able to tune some configurations and settings, but in a limited way.

Finally, you can focus on your work as a developer and consumer of this database service.

AWS Services

We present here the managed database services on AWS.

  • DynamoDB : A fast and flexible NoSQL database service. It’s a key-value and document database that lets you create documents tables directly in cloud.

  • RDS: A relational database managed service. Basically, it allows you to create relational databases. You can create PostgreSQL, MySQL, MariaDB… Databases without caring about servers or infrastructures. You can also set up multi-zones/regions replications.

  • RDS Aurora: MySQL and PostgreSQL-compatible relational database built for the cloud. Performance and availability of commercial-grade databases at 1/10th the cost.

  • Amazon DocumentDB: It’s a fast, scalable, highly available, and fully managed document database service. It allows you to create MongoDB compatibles databases in cloud.

  • Amazon Timestream: This is a bit special, it’s a time series oriented and fully managed service. You can use it to collect, store, and process time-series data such as server and network logs, sensor data, and industrial telemetry data for Internet of Things (IoT) and operational applications.

  • Lightsail: So yeah, Lightsail enables you to powerup Virtual Private Server (VPS) fast. But also, you can use it to launch PostgreSQL and MySQL instances without setting up the server. It’s simple and fast, but not cost effective.

  • Amazon Aurora Serverless: It’s an on-demand, auto-scaling configuration for Amazon Aurora, which is compatible with MySQL. It enables you to run your database in the cloud without managing any database instances. It’s a simple, cost-effective option for infrequent, intermittent, or unpredictable workloads.

  • Amazon Neptune: A fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets. It’s highly available, with read replicas, point-in-time recovery, continuous backup to Amazon S3, and replication across Availability Zones.

  • Amazon Quantum Ledger Database (QLDB): It’s a fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log owned by a central trusted authority. QLDB is also serverless, so it automatically scales to support the demands of your application. There are no servers to manage and no read/write limits to configure.

Alternatives

You have a lot of alternatives to AWS services, mainly in other big public clouds like Microsoft Azure and Google Cloud Platform (GCP).

Microsoft Azure

  • Cosmos DB: It’s a fully managed NoSQL database service, and can handle transparent multi-master replication. It’s elastic and provides unlimited scalability.

  • SQL Database: A SQL Server managed service. It allows you to create SQL Server compliant databases in Azure cloud. As the other, it provides scalability and other fun features, like a built-in machine learning for peak database performance and durability that optimizes performance and security for you, or easy migrations with no changes in your code.

  • Azure for MySQL/PostgreSQL/MariaDB : Whether it’s MySQL/MariaDB or PostgreSQL, there’s a service named after it in Azure cloud that allows you to create a fully managed database as a service, so click and go. You can set up replications, backups, and sleep well with the automatic scalability system.

  • Storage table: A NoSQL key-value store which is serverless. You can store semi-structured data, create massively-scalable apps using JSON to serialize data. Pretty sexy.

Google Cloud Platform (GCP)

  • Cloud Datastore : It’s a NoSQL managed database that’s automatically manages partitioning and data replication so you have a long-lasting, high-availability database that can dynamically scale. It offers a multitude of features, such as ACID transactions, SQL queries, indexes and more.

  • Cloud SQL : A fully managed database service. You can launch PostgreSQL, MySQL, and SQL Server relational databases in the cloud. It provides high performance, scalability and convenience. It’s still in beta.

  • Cloud BigTable : It’s a Petabyte-wide NoSQL database fully managed service, designed to support large-scale analytics and operational workloads.

  • Cloud Spanner : it’s a strategic relational database service, which is fully managed and designed to provide transactional consistency globally. It provides schematics, the benefits of SQL (ANSI 2011 with extensions), and an automatic synchronous replication feature to ensure high availability.

Of course, there are other database managed services out there, in other Cloud Service Providers (CSP).

Another alternative is to set up a private cloud like OpenStack. Then, it’s up to you to install the database services you want, like Trove and finally, your end users can use fully managed database service. And fully set up by you ;-)

Finally, you can install your database service on a server, whether it’s in a public|private cloud or on premise (bare metal).

Automate

Ansible

For most of the techs we presented above, there’s a set of Ansible modules to help you interact with it. Let’s see some examples.

*Note: as it’s Ansible modules we’re talking about, you don’t have to install/configure anything other than Ansible to execute the following examples. Just, don’t forget to setup your .aws/credentials or to export AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY variables in your env.*

Not all services can be built/exploited by Ansible at this point, so there’s no example for QLDB, Timestream, DocumentDB, Aurora Serverless and Neptune.

DynamoDB

To exploit AWS DynamoDB service, there’s dynamodb_table module. It can create or delete DynamoDB tables, update the provisioned throughput on existing tables, set indexes, and finally you can return the status of the specified table.

Here’s how to create a table with it:

- hosts: localhost
  become: true

  tasks:
    - name: "Create table commands" 
      dynamodb_table:
        name: commands
        region: eu-west-1
        hash_key_name: id
        hash_key_type: STRING
        range_key_name: create_time
        range_key_type: NUMBER
        read_capacity: 2
        write_capacity: 2
        tags:
          tag_name: tags_for_commands

You can delete the same table only by providing its name, region and set the state to absent:

- hosts: localhost
  become: true
  
  tasks:
    - name: Delete table commands
      dynamodb_table:
        name: commands
        region: eu-west-1
        state: absent

And it goes without saying that you can increase read capacity or write capacity using the same task, with other variables.

RDS

As for DynamoDB and for many parts of AWS, there’s an Ansible module to deal with RDS.

As others SGBD supported by RDS, you can create a MySQL database with a simple task:

- hosts: localhost
  become: true
  
  tasks:
    - name: Create a database for my poneys!
      rds:
        command: create
        instance_name: my-little-poney
        db_engine: MySQL
        size: 20
        instance_type: db.t3.small
        username: superadmin
        password: use_a_vault_here_please
        tags:
          Environment: pasture

As simple as that. Let’s see something more cool. For example, we can create a replica from an existing database, with Ansible.

- hosts: localhost
  become: true
  
  tasks:
    - name: Create a replica database from my-little-poney
      rds:
        command: replicate
        instance_name: my-little-poney-backup
        source_instance: my-little-poney
        wait: yes
        wait_timeout: 300

And so on. Ansible covers a few AWS managed services, and the documentation is really great.

Nevertheless, you have to consider that Ansible does not cover all the AWS scope. To build a real infrastructure, and have all the necessary tools and granularity, you may consider Terraform, as in the following examples.

Terraform

Below you will find an example of a deployment of a simple wordpress db using a random password.

resource "random_string" "snaprand" {
  length = 4
  special = false
}

resource "aws_db_instance" "wordpress-db" {
  allocated_storage    = 20
  storage_type         = "gp2"
  engine               = "mariadb"
  engine_version       = "10.2.12"
  instance_class       = "${var.rds_class}"
  identifier           = "wpdb-${var.project-name}-${var.env}"
  name                 = "${var.project-name}${var.env}"
  username             = "master"
  password             = "${random_string.rds_password.result}"
  vpc_security_group_ids = ["${aws_security_group.rds_security_group.id}"]
  final_snapshot_identifier ="${var.project-name}-finalsnap-${random_string.snaprand.result}"

  tags {
    Name               = "wpdb-${var.project-name}-${var.env}"
    project            = "${var.project-name}"
  }
}

CloudFormation

In the following example, you will find how to deploy a single instance of PostgreSQL db without multi-AZ with customized settings, like IAM authentication and SSL enabled.

  RDSDBInstanceParameterGroup:
    Type: AWS::RDS::DBParameterGroup
    Properties:
      Description: !Sub 'postgres9.6.${EnvironmentName}'
      Family: postgres9.6
      Parameters:
        rds.force_ssl: '1'

RDSDBInstance:
    Type: 'AWS::RDS::DBInstance'
    Properties:
      DBName:
        - !Sub '${EnvironmentName}-db'
      EnableIAMDatabaseAuthentication: True
      Engine: postgres
      DBParameterGroupName: !Ref RDSDBInstanceParameterGroup
      DBSubnetGroupName: !Ref RDSSubnetGroup
      PubliclyAccessible: true
      VPCSecurityGroups:
        - !GetAtt DatabaseSecurityGroup.GroupId
      MasterUsername: !Ref MasterUsername
      MasterUserPassword: !Ref MasterUserPassword
      DBInstanceClass: db.t2.small
      DeleteAutomatedBackups: false
      StorageEncrypted: true
      KmsKeyId: !Sub 'arn:aws:kms:eu-west-1:${AWS::AccountId}:alias/aws/rds'
      AllocatedStorage: "20"
      MultiAZ: false
    
   RDSSubnetGroup:
      Type: "AWS::RDS::DBSubnetGroup"
      Properties:
        SubnetIds: !Ref PublicSubnetIds
        DBSubnetGroupDescription: 'Subnet group for RDS'

Command-Line Interface (CLI)

Assuming you know the tool awscli, we’re going to provide some examples to run your managed databases in AWS cloud. Yes, run, not build, because in real life we are building a lot with Terraform or Cloudformation and not so much with awscli.

Note: Remember your credentials needs to be setted up in your ~/.aws/credentials configuration file. Also, note that some services may note be available in your favorite region. Please refer to the AWS regional table to know. Here, we’re using eu-west-1, unless specified otherwise. Also, if you want to know more, you can visit the official documentation.

The logic behind awscli is pretty simple. Your commands will always look like :

$ aws <service> <command> --argument value

Here’s a few examples to get you started.

QLDB

QLDB is only available in eu-west-1 since recently. As we said, it’s a ledger database to store contract and transaction, we can say it’s blockchain-like.

Let’s create one of these:

$ aws qldb create-ledger --name "my-awesome-qldb" --permissions-mode "ALLOW_ALL"

Now you can interact with it, it’s alive:

$ aws qldb list-ledgers
{
    "Ledgers": [
        {
            "Name": "my-awesome-qldb",
            "State": "ACTIVE",
            "CreationDateTime": 1568318095.065
        }
    ]
}

I’m not an expert in ledgers databases so you’ll have to figure how to exploit it by yourself ;-)

For all your awscli experiences, you can check a link like :

https://docs.aws.amazon.com/cli/latest/reference/<service>/

Fun fact: AWS DocumentDB is named docdb in these documentations.

NoSQL Design consideration

If you want to go deeper on NoSQL and DynamoDB design for your cloud native workload, take a look at this paper regarding design consideration of NoSQL before starting anything. - AWS Docs: NoSQL Design Consideration

Multi-region

Generally speaking, all these managed services are multi-AZ capable, but just few of them are multi-region natively.

Before starting, ask yourself if your selected Managed Database Service is available in your AWS Region. You will find in this Region table the list of AWS services, and the associated availability for each AWS region.

Managed Service Multi-Region? Comment
DynamoDB Y(*) (*)Use global tables for Multi-Region
RDS N DIY, MultiAZ only
Aurora Serverless Y MySQL only
DocumentDB N Does not support cross-region replicas
Timestream N Preview
Neptune N Does not support cross-region replicas
Quantum Ledger Database (QLDB) N Multi-AZ only

Scalability

For the majority of these managed services you don’t have to manage the scalability as it is a native or easier feature. For some database services you will need to deploy read-replicas, or to setup “options” to enable the scalability. DynamoDB is a good example with two modes: - On-Demand: pay as you go model - Provisionned Mode: you need to do some kind of capacity planning to prepare for high usage of your table. (load tests, sharding)

Pricing

  • It depends :-)

But usually, you can check pricing for a service on a link like the following:

https://aws.amazon.com/<service>/pricing

Bring Your Own Licences (BYOL)

AWS offers the chance to use your own licenses on RDS workloads, It’s called BYOL for Bring Your Own Licences. All major editors like Oracle, Microsoft are offering this type of licenses.

More information