Under “Instance group configuration”, select Instance Fleets. Under Network, select the VPC that you deployed using the CloudFormation template earlier in the workshop, and select all subnets in the VPC. When you select multiple subnets, the EMR cluster will still be started in a single Availability Zone, but EMR Instance Fleets will make the best instance type selection based on available capacity and price across the multiple availability zones that you specified.
The workshop focuses on running Spot Instances across all the cluster node types for cost savings. If you want to dive deeper into when to use On-Demand and Spot in your EMR clusters, click here
Unless your cluster is very short-lived and the runs are cost-driven, avoid running your Master node on a Spot Instance. We suggest this because a Spot interruption on the Master node terminates the entire cluster.
For the purpose of this workshop, we will run the Master node on a Spot Instance as we simulate a relatively short lived job running on a transient cluster. There will not be business impact if the job fails due to a Spot interruption and later re-started.
Click Add / remove instance types to fleet and select two relatively small and cheap instance types - i.e c4.large and m4.large and check Spot under target capacity. EMR will only provision one instance, but will select the best instance type for the Master node based on price and available capacity.
Avoid using Spot Instances for Core nodes if your Spark applications use HDFS. That prevents a situation where Spot interruptions cause data loss for data that was written to the HDFS volumes on the instances. For short-lived applications on transient clusters, as is the case in this workshop, we are going to run our Core nodes on Spot Instances.
When using EMR Instance Fleets, one Core node is mandatory. Since we want to scale out and run our Spark application on our Task nodes, let’s stick to the one mandatory Core node. We will specify 4 Spot units, and select instance types that count as 4 units and will allow to run one executor.
Under the core node type, Click Add / remove instance types to fleet and select instance types that have 4 vCPUs and enough memory to run an executor (given the 18G executor size), for example:
Our task nodes will only run Spark executors and no HDFS DataNodes, so this is a great fit for scaling out and increasing the parallelization of our application’s execution, to achieve faster execution times.
Under the task node type, Click Add / remove instance types to fleet and select the 5 instance types you noted before as suitable for our executor size and that had suitable interruption rates in the Spot Instance Advisor.
Since our executor size is 4 vCPUs, and each instance counts as the number of its vCPUs towards the total units, let’s specify 40 Spot units in order to run 10 executors, and allow EMR to select the best instance type in the Task Instance Fleet to run the executors on. In this example, it will either start 10 * r4.xlarge / r5.xlarge / i3.xlarge or 5 * r5.2xlarge / r4.2xlarge in EMR Task Instance Fleet.
If you are using a new AWS account, or an account where Spot Instances were never launched in, your ability to launch Spot Instances will be limited. To overcome this, please make sure you launch no more than 3 instances in the Task Instance Fleet. You can do this, for example, by only providing instances that count as 8 units, and specify 24 for Spot units.
If your Task Instance Fleet is stuck on provisioning, try lowering the number of requested instances further. Your Spark application should still complete successfully, but it might take longer due to having less executors in the cluster.
click Next to continue to the next steps of launching your EMR cluster.