Now we’re in the process of getting started with adopting Spot Instances for our EMR clusters. We’re still not sure that our jobs are fully resilient and what would actually happen if some of the EC2 Spot Instances in our EMR clusters get interrupted, when EC2 needs the capacity back for On-Demand.
In most cases, when running fault-tolerant workloads, we don’t really need to track the Spot interruptions as our applications should be built to handle them gracefully without any impact to performance or availability. However, when we get started with running our EMR jobs on Spot Instances this could be useful, as our organization can use these to correlate to possible EMR job failures or prolonged execution times, in case Spot Instances were interrupted during Spark run time.
Let’s set up an email notification for when Spot interruptions occur, so if there are any failures in our EMR applications, we’ll be able to check if the failures correlate to a Spot interruption.
You now have an SNS topic that CloudWatch Events can send the EC2 Spot Interruption Notification to, let’s configure CloudWatch to do so. In the AWS Management Console, go to Cloudwatch -> Events -> Rules and click Create Rule.
Under Service Name select EC2 and under Event Type select EC2 Spot Instance Interruption Warning
On the right side of the console, click Add Target, scroll down and select SNS topic -> select your topic name, Your result should look like this:
Click Configure Details in the bottom right corner.
Provide a name to your CloudWatch Events rule and click Create rule.
The only way to simulate a Spot Interruption Notification is to use Spot Fleet. Spot Fleet is an EC2 instance provisioning and management tool that is not used in this workshop for any of the actual EMR/Spark work (not to be confused with EMR Instance Fleets). We will only use Spot Fleet to trigger a Spot Interruption that will help you verify that the notification that you set up works.
{"version":"0","id":"6009a9f4-cc7a-8a77-46f2-310520b31e0f","detail-type":"EC2 Spot Instance Interruption Warning","source":"aws.ec2","account":"<account-id>","time":"2019-05-27T04:52:57Z","region":"eu-west-1","resources":["arn:aws:ec2:eu-west-1b:instance/i-0481ef86f172b68d7"],"detail":{"instance-id":"i-0481ef86f172b68d7","instance-action":"terminate"}}
Go ahead and terminate the fleet request itself by checking the fleet, click actions -> Cancel Spot request -> Confirm.
From now on, any EC2 Spot interruption in the account/region that you set this up in will alert you via email. Disable or delete the CloudWatch Event rule if you are not interested in the notifications.