In this section we will use Amazon Athena to run a SQL query against the results of our Spark application in order to make sure that it completed successfully. We’ll compare the results when you didn’t interrupt the Spark job when creating the cluster, with the results when you interrupt three Spot task nodes.
Workgroupdropdown, click it and change it to
Go to the
Saved queries tab, and open the
EMRWorkshopResults saved query. This is the query to create the table that uses the S3 bucket as a source where the first Spark job saved its results.
Click on the
Go to the
Saved queries tab again, and open the
EMRWorkshopResultsSpot saved query. This is the query to create the table that uses the S3 bucket as a source where the subsequen Spark job that you interrupted saved its results.
To look at some of the results, run this query:
SELECT * FROM "EMRWorkshopResults" ORDER BY count DESC limit 100;
And to confirm that the number of rows match, run the following commands:
SELECT COUNT(*) FROM "EMRWorkshopResults";
SELECT COUNT(*) FROM "EMRWorkshopResultsSpot";
Both results match.