For the past few weeks, I’ve been working on spinning up a WordPress stack on Amazon AWS. It’s intended to be a production application, so it uses Multi-AZ and a few other tricks to try to achieve relatively high fault tolerance (nothing insane, still in one region). It uses AWS’s RDS hosted MySQL service for the database, and the stacks are created with CloudFormation. Using CloudFormation has been an utterly wonderful experience and being able to spin up an entire stack - multiple autoscaling web server instances, a database, memcache, etc. with the click of a button in ~20 minutes - is as close to operations nirvana as I’ve ever gotten.
One of the last steps for me was to work on database backups and restoration; both restoring the production application’s database to a previous snapshot, and restoring a production database snapshot to a test or development stack. This took a few days of testing, and I wasn’t able to find much complete information on the nuances of it; there are also some pieces that are not intuitive and (IMO) not documented well enough in the AWS docs. In short, it’s horribly easy to blow away your entire database. So, I’m going to attempt to document some of what I learned, in the hope that it will benefit others.
At the bottom of this post I’ve included some snippets from my CloudFormation template, which I make reference to. It’s probably worth looking through that, as I make reference to some of the names used in it. Also, to make sense of this, you should be familiar with the nomenclature used by CloudFormation, such as the template anatomy and the difference between parameters and properties, and resources and instances.
Note: I’m writing this in mid-December 2014. I’ll make every effort to keep this updated as I continue working with AWS, but it’s possible that some of the problems described herein will be fixed by AWS in the future.
DeletionPolicy Snapshot
CloudFormation resources support a DeletionPolicy attribute that says what to do to a resource when deleted. For RDS instances, “Snapshot” is an option, which takes a manual snapshot when the resource is deleted (manual snapshots, unlike the automated daily ones, live on even after the instance is deleted). Be warned, this only takes effect when you delete the entire stack. If you make a change to one of the DBInstance properties that requires a resource replacement to take effect, the RDS instance will be replaced with a new one, and all of the data and automatic snapshots from the old one will be deleted. That last part deserves repeating: automatic snapshots (the daily ones created by RDS) are tied to the instance; if the instance is replaced by CloudFormation, you lose all automatic (backup) snapshots with it.
Stack Policy to Prevent Updates
To prevent RDS data loss from accidentally changing a property of the instance, it’s wise to add a stack policy to prevent updates to RDS resources. This will prevent CloudFormation from making any changes to the RDS instance at all. Once the stack policy is in place, in order to make changes to the RDS instance you would either need to set a temporary stack policy to allow the update (see the “Updating Protected Resources” section of the stack policy documentation) or simply delete and re-create the stack (the recommended method, if it’s feasible for you).
Setting a proper stack policy should prevent many of the pitfalls I describe below; however, for completeness, I’ve described how RDS resources behave currently without a stack policy protecting them. The AWS::RDS::DBInstance resource documentation describes which properties can be updated in-place (“Update requires: No interruption” or “some interruptions”) and which trigger complete replacement of the RDS instance (“Update requires: replacement”).
When you try to update a protected resource through the aws
CLI tools, the update will appear to have worked, but the event
log on the stack will show the update denied and the update will be rolled back.
Restoring Snapshots and DBName
The DBSnapshotIdentifier property on a MySQL RDS instance specifies a RDS snapshot to restore into the instance. The DBName
property will create a new RDS instance with a single blank database of that name. This bears repeating again; if the DBName
property ever changes, your RDS instance will be replaced with one with a new, blank database of that name.
When creating a MySQL RDS instance, you can specify either the DBName
or DBSnapshotIdentifier
property, but not both;
if you attempt to specify both, you’ll get an error, “DBName must be null when Restoring for this Engine.”
If you want to restore a snapshot to a new RDS instance, you’ll need to ensure that DBName
is null (either not specified at all, or the special AWS::NoValue
pseudo parameter). In order
to do this automatically (and since NoValue/null can’t be passed in as a template parameter), in the template snippet below I’ve defined a
UseDbSnapshot
condition
that evaluates to true if the DBSnapshotIdentifier
parameter is not empty. In my RDS::DBInstance
resource,
I conditionally set (using Fn::If
)
the DBSnapshotIdentifier
and DBName
properties depending on the value of UseDbSnapshot
. The end result is that if the
DBSnapshotIdentifier
parameter is not empty, it is passed in as the DBSnapshotIdentifier
property of the resource and
the DBName
property is set to AWS::NoValue
. Otherwise, the DBSnapshotIdentifier
property is set to AWS::NoValue
and the DBName
parameter is passed in to the corresponding property on the resource (indicating to create a new blank database
of that name).
To explain this a bit more, CloudFormation seems to have no introspection into RDS instances. The DBName
parameter
exists only in CloudFormation itself, and is only evaluated as a diff from the previous template; if it changes,
CloudFormation spins up a completely new RDS instance with a single blank database of that name. Whether or not
the value of DBName
matches the database currently in the RDS instance (say, restored from a snapshot)
is not known by CloudFormation. In short, if you create an RDS instance from a snapshot of a “foo” database
and then change the template to have a DBName
of “foo”, CloudFormation will spin up a new RDS instance
with an empty “foo” database.
Restoring to a New Stack
When restoring to a new stack (stack creation), specify the DBSnapshotIdentifier
and make sure DBName
is set
to AWS::NoValue
per the previous paragraph (condition in the template). Note that for the life of the stack, you
must continue specifying these parameters (or the “use previous value” option for them). Using my example template
below, if you restored into a new stack using the DBSnapshotIdentifier
parameter and then later updated the stack
and omitted that parameter (which, because of the condition, would set it to NoValue
and set the DBName
parameter
to its default value) the RDS instance would be replaced with a new one with a blank database.
Because of this, stack updates should always use the previous value for the DBSnapshotIdentifier
parameter; this can
be done through the AWS Console, or using the aws
command line tools and a parameter like: --parameters ParameterKey=DBSnapshotIdentifier,UsePreviousValue=true
.
Restoring to an Existing Stack
Restoring a snapshot to an existing stack is a bit more nuanced. You can’t restore a snapshot to an existing RDS instance,
you can only restore to a new instance. If you do this through the AWS Console, you’ll end up with an RDS instance disconnected
from your CloudFormation stack. So the way to do this is more or less the same as restoring to a new stack - specify
the DBSnapshotIdentifier
parameter for your template, and it will create a new RDS instance with the snapshot. The same
rules about using previous values for the parameters hold true. If you used a stack policy to prevent updates to the RDS
instance, you’ll need to override that with a temporary policy when doing the restore.
There are a few caveats to keep in mind with this procedure. The first, obviously, is that there may be some application downtime
when the existing database is replaced with the new (restored) one, and any writes will obviously be lost. Also, this only
works on RDS instances that were created with DBName or a different snapshot. In order to restore the same snapshot to
an RDS resource a second time, you need to first update with the DBSnapshotIdentifier
parameter removed and have the RDS
instance re-created with an empty database, and then update again with the DBSnapshotIdentifier
in order to do the restore.
This is because CloudFormation doesn’t reconcile the current state of instances to determine which actions to take, it only diffs
the updated template against the existing one. If the existing template and the updated one have the same value for the RDS instance’s
properties (specifically DBSnapshotIdentifier
), CloudFormation determines there are no changes, and does nothing.
LaunchConfig Metadata Issues
The EC2 instances I’m using for this project are “baked” AMIs (built with packer.io) in an Auto-Scaling Group (ASG). They use a LaunchConfig to write out a file on disk with the database connection information for the application. In addition, my ASG has an UpdatePolicy designed to perform rolling updates (termination and replacement) of EC2 instances when their properties change.
In my testing, I noticed a number of times where updates to the RDS resource that triggered creation of a new RDS instance - such as restoring from a snapshot in an existing stack, or changing the DBName - properly triggered an update of the LaunchConfig, but failed to trigger the rolling update of the EC2 instances. This left the application in a state where one or more (sometimes all) of the EC2 instances couldn’t connect to the database, because the file written out by the LaunchConfig still contained the old DB connection information. For non-production stacks where the entire stack can be deleted and recreated instead of updating the RDS resource, this shouldn’t be an issue. Otherwise, if changes are made that replace the RDS instance, I’d recommend watching for the LaunchConfig update completion, and manually terminating instances (or increasing the size of the ASG to add instances) to ensure that the running EC2 instances have the updated LaunchConfig.
Another option would be to use the cfn-hup daemon to listen for stack updates that cause changes in resource metadata, and perform the required actions without needing the rolling update to replace the instances.
How to Do Things Using the Template Below
I’m currently using the aws
command line tools to perform stack creation and updates,
wrapped in a Rakefile (I plan on changing this to use boto
inside a Jenkins job). What follows is a quick high-level guide
on how to accomplish various RDS-related tasks, using the template snippet below.
-
Build a new stack using a RDS snapshot and a stack policy to prevent updates:
$ cat /tmp/stack_policy.json { "Statement" : [ { "Effect" : "Deny", "Action" : "Update:*", "Principal": "*", "Resource" : "LogicalResourceId/DBInstance" }, { "Effect" : "Allow", "Action" : "Update:*", "Principal": "*", "Resource" : "*" } ] } $ aws cloudformation create-stack --stack-name mystack --stack-policy-body file:///tmp/stack_policy.json --template-body file:///home/myuser/cloudformation_template.json --parameters ParameterKey=DBSnapshotIdentifier,ParameterValue='my-snapshot-identifier'
-
Temporarily override stack policy to allow updates:
-
Create a file with the following contents (we’ll assume it’s at
/home/myuser/allow_all_updates.json
):{ "Statement" : [ { "Effect" : "Allow", "Action" : "Update:*", "Principal": "*", "Resource" : "*" } ] }
-
In the following
aws
commands, append--stack-policy-during-update-body file:///home/myuser/allow_all_updates.json
-
-
Update a stack (built using a RDS snapshot), without losing data:
$ aws cloudformation update-stack --stack-name mystack --template-body file:///home/myuser/cloudformation_template.json --parameters ParameterKey=DBSnapshotIdentifier,UsePreviousValue=true
-
Load a RDS snapshot into an existing stack (that isn’t already using this snapshot):
$ aws cloudformation update-stack --stack-name mystack --template-body file:///home/myuser/cloudformation_template.json --parameters ParameterKey=DBSnapshotIdentifier,ParameterValue='my-snapshot-identifier'
-
Load a RDS snapshot into an existing stack again (i.e. restore from the same snapshot a second time; this one is a kludge):
$ # re-create the RDS instance with a blank DB (DBName) $ aws cloudformation update-stack --stack-name mystack --template-body file:///home/myuser/cloudformation_template.json --parameters ParameterKey=DBSnapshotIdentifier,ParameterValue='' $ # then load the snapshot again $ aws cloudformation update-stack --stack-name mystack --template-body file:///home/myuser/cloudformation_template.json --parameters ParameterKey=DBSnapshotIdentifier,ParameterValue='my-snapshot-identifier'
CloudFormation Template Snippet
This is by no means complete, but just includes the parameters, conditions, and resources which I make reference to.
{
"Parameters" : {
"DBName" : {
"Default": "wordpress",
"Description" : "The WordPress database name",
"Type": "String",
"MinLength": "1",
"MaxLength": "64",
"AllowedPattern" : "[a-zA-Z][a-zA-Z0-9]*",
"ConstraintDescription" : "must begin with a letter and contain only alphanumeric characters."
},
"DBSnapshotIdentifier" : {
"Description" : " The RDS MySQL snapshot name to restore to the new DB instance.",
"Type": "String",
"Default": ""
},
},
"Conditions" : {
"UseDbSnapshot" : {
"Fn::Not" : [{
"Fn::Equals" : [
{"Ref" : "DBSnapshotIdentifier"},
""
]
}]
}
},
"Resources" : {
"DBInstance" : {
"Type": "AWS::RDS::DBInstance",
"Properties": {
"DBName" : {
"Fn::If" : [
"UseDbSnapshot",
{ "Ref" : "AWS::NoValue"},
{ "Ref" : "DBName" }
]
},
"Engine" : "MySQL",
"MasterUsername" : { "Ref" : "DBUsername" },
"DBInstanceClass" : { "Ref" : "DBClass" },
"DBSecurityGroups" : [{ "Ref" : "DBSecurityGroup" }],
"DBSubnetGroupName": { "Ref": "DBSubnetGroup" },
"AllocatedStorage" : { "Ref" : "DBAllocatedStorage" },
"MasterUserPassword" : { "Ref" : "DBPassword" },
"DBSnapshotIdentifier" : {
"Fn::If" : [
"UseDbSnapshot",
{ "Ref" : "DBSnapshotIdentifier" },
{ "Ref" : "AWS::NoValue"}
]
},
"MultiAZ" : true
},
"DeletionPolicy" : "Snapshot"
},
"WebServerGroup" : {
"Type" : "AWS::AutoScaling::AutoScalingGroup",
"Properties" : {
"LaunchConfigurationName" : { "Ref" : "LaunchConfig" },
},
"UpdatePolicy": {
"AutoScalingRollingUpdate" : {
"MinInstancesInService" : "1",
"MaxBatchSize" : "1",
"WaitOnResourceSignals" : "true",
"PauseTime" : "PT10M"
},
"AutoScalingScheduledAction" : {
"IgnoreUnmodifiedGroupSizeProperties" : true
}
},
"CreationPolicy" : {
"ResourceSignal" : {
"Timeout" : "PT10M",
"Count" : "2"
}
}
},
"LaunchConfig": {
"Type" : "AWS::AutoScaling::LaunchConfiguration",
"Metadata" : {
"AWS::CloudFormation::Init" : {
"config" : {
"files" : {
"/opt/wordpress/cloudformation_db.php" : {
"content" : { "Fn::Join" : ["", [
"<?php\n",
"define('DB_NAME', '", {"Ref" : "DBName"}, "');\n",
"define('DB_USER', '", {"Ref" : "DBUsername"}, "');\n",
"define('DB_PASSWORD', '", {"Ref" : "DBPassword" }, "');\n",
"define('DB_HOST', '", {"Fn::GetAtt" : ["DBInstance", "Endpoint.Address"]},"');\n",
]] },
}
}
}
}
}
}
}
}
Comments
comments powered by Disqus