Prerequisites and initial steps

General requirements and notes:

  1. This workshop is self-paced. The instructions will walk you through achieving the workshop’s goal using the AWS Management Console.
  2. While the workshop provides step by step instructions, please do take a moment to look around and understand what is happening at each step as this will enhance your learning experience. The workshop is meant as a getting started guide, but you will learn the most by digesting each of the steps and thinking about how they would apply in your own environment and in your own organization. You can even consider experimenting with the steps to challenge yourself.

Preparation steps:

Select the correct tab, depending on where you are running the workshop:

  1. Create an S3 bucket - we will use this for our Spark application code (which will be provided later) and the Spark application’s results.
    Refer to the Create a Bucket page in the Amazon S3 Getting Started Guide

  2. Deploy a new VPC that will be used to run your EMR cluster in the workshop.
    a. Open the “Modular and Scalable VPC Architecture Quick stage page” and go to the “How to deploy” tab, Click the “Launch the Quick Start” link.
    b. Select your desired region to run the workshop from the top right corner of the AWS Management Console and click Next.
    c. Provide a name for the stack or leave it as Quick-Start-VPC.
    d. Under Availability Zones, select three availabliity zones from the list, and set the Number of Availabliity Zones to 3.
    e. Under Create private subnets select false.
    f. click Next and again Next in the next screen.
    g. Click Create stack.
    The stack creation should take under 2 minutes and the status of the stack will be CREATE_COMPLETE.

Create an S3 bucket - we will use this for our Spark application code (which will be provided later) and the Spark application’s results.
Refer to the Create a Bucket page in the Amazon S3 Getting Started Guide

You don’t need to create a VPC, as the workshop account already has a default VPC that we will use in this workshop.

Congratulations! you completed the prerequisites needed to start the workshop, you now have a VPC to run your EMR cluster in, and an S3 bucket for the Spark application code and the results. Continue to the next step to proceed in the workshop.