I really like AWS CodePipeline to handle my AWS deployments but there is one function that is missing from it in my opinion when you are doing multi region deployments which is the ability to let your changes bake for a proscribed amount of time while monitoring your service for any alarms to go off before deploying to the next region.
Sure you could relatively easily use CodeBuild and write a small script that waited while monitoring your alarms, but that always seemed really wasteful of compute to me. So instead I came up with a really small step function. All it does is take an input in minutes and for every minute it will monitor all alarms and if any of them go into an alarming state it will fail.
As you might notice there is no dedicate compute for this function and it does 3 step transitions per minute in total which means that an hour of bake time will cost you roughly $0.005 after you have used up the 4000 step transitions of perpetual free tier that Step Functions have. Below is the function definition in whole.
{ "StartAt": "DescribeAlarms", "States": { "DescribeAlarms": { "Type": "Task", "Arguments": { "AlarmTypes": [ "CompositeAlarm", "MetricAlarm" ] }, "Resource": "arn:aws:states:::aws-sdk:cloudwatch:describeAlarms", "Output": { "Duration": "{% $states.input.Duration - 1 %}", "InAlarm": "{% $count($states.result.MetricAlarms[StateValue = 'ALARM']) + $count($states.result.CompositeAlarms[StateValue = 'ALARM']) %}" }, "Next": "Choice" }, "Success": { "Type": "Succeed" }, "Choice": { "Type": "Choice", "Choices": [ { "Next": "Fail", "Condition": "{% $states.input.InAlarm > 0 %}" }, { "Next": "Success", "Condition": "{% $states.input.Duration < 0 %}" } ], "Default": "Wait" }, "Fail": { "Type": "Fail" }, "Wait": { "Type": "Wait", "Seconds": 60, "Next": "DescribeAlarms" } }, "QueryLanguage": "JSONata", "Comment": "Waits for a proscribed \"Duration\" minutes and will fail if any alarm goes into alarming state." }
You would invoke it with a payload looking like this.
{ "Duration": 10 }
Where 10 is the number of minutes you wish for it to wait.
This version will alarm on any defined CloudWatch Alarm going into alarming state, but you could easily modify the DescribeAlarms state's InAlarm expression or Arguments above to exclude alarms you don't want to fail your bake time such as alarms used for auto scaling and similar which are not indicating service issues.
The only permission the function needs is to cloudwatch:DescribeAlarms in addition to the normal Step Function permissions. When using in CodePipeline you might also need to add the permission to run your StepFunction to the pipeline execution role.