Reading time ~9 minutes
Automated GDrive Backups with ECS and S3
- Security Considerations
- How Much Does this Cost?
- Show Me the Code
Over the past couple of weeks, I started thinking a bit more about adding resiliency to my personal projects and accounts.
In a previous post (“Automated Github Backups with ECS and S3”) I started by taking a look at how to back up my Github data. In this blog post, instead, I’m going to focus on GDrive, since it is where I store the majority of my personal data.
In fact, I finally decided to set some time aside to set up an automated process to backup my GDrive account, and I ended up relying on ECS (Fargate) and S3 Glacier. This blog explains the architecture and implications of the final setup I decided to go with.
I like the idea but I will take another approach (Glacier instead of EFS, Terraform instead of CDK). I'll start working on this after publishing the blog post about Github Backups.— Marco Lancini (@lancinimarco) June 21, 2021
Similarly to what I described in “Automated Github Backups with ECS and S3” (this post will indeed have the same structure), this is how the final setup looks like:
- Backups of my GDrive account are taken via an ECS (on Fargate) Task Definition, with execution triggered periodically by a CloudWatch Event Rule, and secrets (i.e., the OAuth Token) pulled from Parameter Store.
- The data fetched from GDrive is zipped and uploaded to an S3 bucket, where it will transition to Glacier after one day.
- The task uses an EFS volume as a temporary location where to store files downloaded from GDrive, and before uploading them to S3. The task removes every file from the EFS Volume upon completion.
- Notifications are sent via SNS for every task starting and/or stopping, as well as for every new object created in the destination S3 bucket.
Let’s see what all of this means, and let’s analyse the different components in more detail.
Docker Image and Backup Logic
Let’s start by talking about the Docker Image which hosts the actual application logic in charge of the backup.
The logic is based on a custom bash script which leverages rclone, a command line program designed to manage and sync files onto cloud storage.
The full script is available on Github,
but it basically runs
rclone with a custom config (more on this later) to first
obtain a copy of the target GDrive folder, zip it, and then to copy the zipped output
This script is then packaged as a Docker image, and stored in an ECR repository within one of my AWS accounts. The image is automatically built and pushed to ECR via Github Actions.
Terraform and Infrastructure Setup
The rest of the components you can see in the “Architecture” diagram above are managed via Terraform. I ended up creating a module which can be used to create:
- An ECR repository where to store the Docker image of the custom bash script.
- A destination S3 bucket with a lifecycle policy which transitions objects to Glacier after 1 day.
- A Systems Manager Parameter Store where to store the rclone configuration file which contains the OAuth secrets.
A dedicated VPC, and a subnet with an Internet Gateway (IGW) attached to it, to allow for egress traffic.
An ECS Cluster on Fargate cluster, in the dedicated VPC.
- An ECS Task Definition, with execution triggered periodically (
cron) by a CloudWatch Event Rule, and secrets pulled by Parameter Store.
- An EFS file system, with a mount target in the same subnet used by the ECS Task.
- For notifications:
- A dedicated SNS Topic.
- A CloudWatch Event Rule to alert on every ECS Task starting (
RUNNING) and/or stopping (
- An S3 Event Notification to alert on every new object created in the destination bucket.
With the infrastructure ready,
the last component missing is a way for rclone to authenticate against the
I came across an useful article
which describes how to set this up using OAuth.
The process is composed by 3 parts, described next:
- Enable the Google Drive APIs.
- Generate OAuth credentials.
- Seed an rclone Config File.
Enable the Google Drive APIs
|Login with your Google account at: https://console.cloud.google.com|
|From the left sidebar, navigate to “APIs & Services > Library”|
|Search for and enable the “Google Drive API”|
|From the left sidebar, select “Credentials”, then “Configure Consent Screen”|
|In the “OAuth Consent Screen”, choose an application name, and provide user support and developer contact information|
|In the “Test users” screen, add your Gmail address associated with GDrive|
Generate OAuth credentials
|From the left sidebar, navigate to “Credentials”, then “Create credentials” and select “Oauth client ID”|
|A client ID & secret will be generated|
Seed an rclone Config File
The last step involves creating a configuration file that can be used by rclone to authenticate against the GSuite APIs and get authorized to retrieve content from GDrive.
First of all, create a template file like the following, replacing
client_secret with the ones generated previously:
In the config above, the
[gdrive] section will be used by rclone
to authenticate against GSuite (notice the
[s3] section is used to configure access to S3
(where the final zip will be uploaded).
Authentication/authorization to S3 is handled
separately by an IAM policy attached to the service account used by the ECS Task.
Next, run the custom Docker image to (re-)authenticate to Gdrive, and follow the process.
As a result, this step it will add a
token entry in the
[gdrive] section of the config file:
Subscribe to CloudSecList
- Run the Terraform module above, which will setup all the necessary components.
- Create OAuth credentials for accessing GDrive, as outlined in the section above.
- Store the rclone config file in the Parameter Store:
rclone config file for GDrive backups
- Build the custom Docker image and upload it to ECR. You could automate this via your CI/CD pipeline, or, otherwise, you could push it manually with a script similar to the one below:
- Wait till the first day of the next month (or run a Task manually), to have your GDrive backup stored into S3!
For those of you who already read “Automated Github Backups with ECS and S3”, you might notice this section is quite similar to what I already described there. I decided to post those conclusions here as well anyway, though, to have this post self-contained.
Code related to all my personal projects is stored within a single monorepo, and all (well, the majority) of dependencies are vendorised (I briefly touched about this in “My Blogging Stack”, but this will probably warrant another post on its own).
The terraform module described in this post leverages 2 other external modules:
Although the public module I released on Github uses the upstream versions, the module I use internally refers to local vendorised copies of these modules.
This is where this solution could be improved, in my opinion.
For my use case, I decided to store the rclone config file
in Parameter Store instead of Secrets Manager
mainly from a pricing point of view, with Parameter Store not incurring in additional charges for
For me, this is a “good enough” tradeoff for now, but I understand Secrets Manager could be seen as a more reliable solution for storing secrets.
For handling backups, I decided to have a dedicated AWS account.
Another improvement could involve setting up cross-account backups, via AWS Backup, to replicate the data stored in S3 into another account. This data, though, already exists in two places already (the live data in GDrive, and the backup in S3) so it seems an overkill for now.
Other two options worth looking into could be S3 Object Lock and Glacier Vault Lock.
How Much Does this Cost?
Since I’ve just deployed this solution, I don’t have enough historical data to show you exactly how much I spent on it.
What I can do, though, is to use the AWS Pricing Calculator to give you an estimate:
|Service||Monthly Forecast ($)||First 12 months Forecast ($)|
As you can see, the biggest entry, as expected, will be storage: I expect to have ~50GB generated each month, for a total of ~150GB concurrently stored in Glacier when at full regime (since the retaining period for each backup is 90 days).
Show Me the Code
As briefly mentioned, both the custom Docker image and the Terraform module needed to recreate the different components of the architecture are available on Github:
- The code for recreating the Docker image, alongside the custom bash script, can be found at: github.com/marco-lancini/utils/tree/main/docker/rclone-gdrive-backup
- The Terraform module can be found at: github.com/marco-lancini/utils/tree/main/terraform/aws-gdrive-backups
In this post I outlined architecture and implications of an automated process aiming to backup a GDrive account, relying on ECS Fargate and S3 Glacier.
I hope you found this post useful and interesting, and I’m keen to get feedback on it! If you find the information shared was useful, if something is missing, or if you have ideas on how to improve it, please let me know on Twitter.