Now that we understand the basics of our trading strategy, lets get our hands dirty building out the batch processing pipeline.
We will start by creating a managed message queue to store the batch job parameters.
Go to the SQS Console, if you haven’t used the service in this region, click Get Started Now. Otherwise, click Create New Queue.
Name your queue “workshop”. Select Standard Queue. Click Quick-Create Queue.
Queue Name is Case Sensitive
For regions that don’t yet support FIFO queues, the console may look different than shown. Just name the queue and accept the defaults.
Save the queue ARN and URL for later use.
Our EC2 instances run with an Instance Profile that contains an IAM role giving the instance permissions to interact with other AWS services. We need to edit the associated policy with permissions to access the SQS service.
Go to the EC2 Console.
Under Instances, select the instance named montecarlo-workshop.
Scroll down and select the IAM Role.
You should see two attached policies. One will be an inline policy named after the workshop. Click the arrow beside the policy and click Edit policy.
Click on Add additional permissions. Click on Choose a service and select or type SQS.
Click on Select actions. Under Manual actions, check the box beside All SQS actions (sqs:*).
You will see a warning that you must choose a queue resource type. Click anywhere on the orange warning line. Under Resources, click on Add ARN.
In the pop-up window, paste the ARN that you saved previously. Click Add.
Click on Review Policy and then click Save changes.
The CloudFormation template deployed a web server that will serve as the user interface. We need to configure it with our SQS queue
Go to the SQS Console and select your queue.
Under Queue Actions, select View/Delete Messages.
Click on Start Polling for Messages
You should see the message that was created by the web client. Explore the message attributes to see what we will be passing to the worker script
Now that we have messages on the queue, lets spin up some workers on EC2 spot instances.
From the EC2 Console, select Launch Template and click Create Launch Template
Leave Create New Template. Name the template MonteCarlo-Workshop-Template
, and use the same for the
template version description
In the Launch template content section, click on the Search for AMI and select the default Amazon Linux 2 HVM 64-bit(x86) AMI
For Key pair name, choose the SSH Key Pair that you specified in the CloudFormation template.
Select the VPC with name VPC for Spot Monte Carlo Simulation Workshop, Under Security groups
select the name with the prefix spot-montecarlo workshop.
Leave the rest to default, at the bottom of the page click on Advanced Details, In the IAM instance profile select the one with the prefix spot-montecarlo workshop.
Finally in the User Data content copy the following:
#!/bin/bash -e
# Install Dependencies
yum -y install git python3 python-pip3 jq
pip3 install --upgrade pandas-datareader yfinance scipy boto3 awscli matplotlib scipy numpy pandas boto3
#Populate Variables
echo 'Populating Variables'
REGION=`curl -s http://169.254.169.254/latest/dynamic/instance-identity/document|grep region|awk -F\" '{print $4}'`
mkdir -p /home/ec2-user/spotlabworker
chown ec2-user:ec2-user /home/ec2-user/spotlabworker
cd /home/ec2-user/spotlabworker
STACK_NAME=$(aws cloudformation --region $REGION list-stacks | jq -r '.StackSummaries[] | select(.TemplateDescription == "Environment for running EC2 Spot Monte Carlo Workshop" and .StackStatus == ("CREATE_COMPLETE", "UPDATE_COMPLETE")).StackName')
WEBURL=$(aws cloudformation --region $REGION describe-stacks --stack-name $STACK_NAME | jq -r '.Stacks[0].Outputs[] | select(.OutputKey == "WebInterface" ).OutputValue ')
if [[ -z $WEBURL || -z $STACK_NAME ]]; then
echo "URL: $WEBURL or Stack $STACK_NAME not defined"
exit 1
else
echo 'Region is '$REGION
echo 'URL is '$WEBURL
fi
echo "Downloading worker code"
wget -q $WEBURL/static/queue_processor.py || { echo 'wget failed' ; exit 1; }
wget -q $WEBURL/static/worker.py || { echo 'wget failed' ; exit 1; }
echo 'Starting the worker processor'
python3 /home/ec2-user/spotlabworker/queue_processor.py --region $REGION> stdout.txt 2>&1
From the EC2 Console, select Spot Requests and click Request Spot Instances. Then Select Flexible Instances
Select the Launch Template MonteCarlo-Workshop-Template
In the Network section, select the VPC with name VPC for Spot Monte Carlo Simulation Workshop, and select
the two subnets available
click on Maintain Target Capacity and leave the interruption behaviour to the default “Terminate”
Expand the Advanced Configuration and select the Health Check : Replace unhealthy instances
For Total target Capacity, type 2
Check the Fleet request settings and check the fleet that has been selected. Notice how each entry has
a different Spot price. Feel free to untick the Apply Recommendations and change the components in the fleet using:
c4.large, c5.large, m4.large, m5.large, t2.large, t3.large. Leave “Capacity Optimized” as the allocation strategy.
You can read about Capacity Optimized and find out what are the last 30 days average for the frequency of interruptions on the selected instance types using Spot Instance Advisor
Click Launch.
Wait until the request is fulfilled, capacity shows the specified number of Spot instances, and the status is Active.
Once the workers come up, they should start processing the SQS messages automatically. Feel free to create some more jobs from the webpage. Check out the S3 bucket to confirm the results are being processed.
In the previous step, we specified two Spot instances, but what if we need to process more than two jobs at once? In this optional section we will configure auto-scaling so that new spot instances are created as more jobs get added to the queue.
Go to the CloudWatch console, and click on Alarms.
Click on Create Alarm. Select SQS Metrics.
Scroll down and select ApproximateNumberOfMessagesVisible. Click Next
We will create a threshold for scaling up. Name the alarm, set the threshold for >= 2 messages for 2 consecutive periods. Delete the default notification actions. Hit Next and write a description and a unique name like Scale Up Spot Fleet
, click on Create Alarm.
Repeat these steps for the scale down policy, use the unique name Scale Down Spot Fleet
and set the threshold for <= 1 message for 3 consecutive periods.
Return to Spot Requests in the EC2 Console.
Select your fleet and go to the Auto Scaling tab at the bottom pane.
Click Configure. On the next screen, click on Scale Spot Fleet using step or simple scaling policies
Under the ScaleUp and ScaleDown policies, configure the appropriate alarms under Policy trigger.
Click Save
Check your S3 Bucket. In a few minutes, you should see results start appearing the bucket.
If you monitor the SQS queue for messages you should see them being picked up by the worker nodes.
In the next lab, we will use AWS Batch to create a managed batch process pipeline. We will reuse our existing queue, so let’s terminate our EC2 Spot worker fleet.
From the EC2 Console, select Spot Requests and click Request Spot Instances.
Check the box beside the Spot fleet request containing your worker nodes. The correct request will have a capacity of 2 and the shortest time since it was created.
Take care not to cancel the Spot fleet request responsible for our workstation node (Jupyter/WebClient). It will have a capacity of 1 and the instance type will be m4.large.
You’ve completed Lab 3, Congrats!