Running Spark apps with EMR on Spot Instances

Overview

Welcome! In this workshop you assume the role of a data engineer, tasked with optimizing the organization’s costs for running Spark applications, using Amazon EMR and EC2 Spot Instances. You learn to apply the best practices such as instance diversification, right-sizing Spark executors, and EMR allocation strategy available to you for cost optimizing running Spark applications on Amazon EMR and Spot Instances.

Estimated time and cost to run this workshop

The estimated time for completing the workshop is 60-90 minutes and the estimated cost for running the workshop’s resources in your AWS account is less than $2.

Prerequisites

This is a level 400 workshop, we expect readers be already familiar with fundamentals of Apache Spark, Amazon EMR, and EC2 Spot Instances.