Book Creator
Add this page to your book

Book Creator
Remove this page from your book

Creating an Alexa Skill to Play a Music File

Author: Santiago Ricoy Email: ricoys1@unlv.nevada.edu
Date: Last modified on 12/28/16
Keywords: alexa skill tutorial voice audio playback alexa sdk

By the end of this tutorial, you'll have a working Alexa Skill that plays music, which can be run by speaking a predefined phrase.

This picture shows part of the Amazon Developer console, which allows you to use many items offered by Amazon to developers; in this case, testing the voice-controlled side of an Alexa skill. We need to understand how to stream files through our Alexa-enabled device. Solving this some degree is important because it demonstrates the ability to fetch files from the web with the Alexa Voice Service (AVS), which implies it can be used to manipulate and control other items in the cloud, and connected devices. This tutorial shows you how to set up a lambda function that talks to the voice-operated end of an Alexa skill. It takes approximately 40 minutes to complete.

Motivation and Audience

This tutorial's motivation is to spark innovation in attendees of the IEEE Winter School. The tutorial assumes the reader has the following background and interests:

* Can sign up and learn to use basic AWS services
* Can navigate Object Oriented code
* Additional background needed may include prior exposure to javascript and node.js
* This tutorial may also attract readers who are interested in voice recognition

The rest of this tutorial is presented as follows:

Required Items

To complete this tutorial, you'll need the following items:

The contents of this git repository: https://github.com/santidingo/alexa_Music_Example_DASL
Computer with Internet access
Wireless internet access (if using an Amazon device)
Node.js runtime installed (get it here)
Text editor (https://atom.io/ was used for this, but others will work)
Optionally: An Alexa-enabled device

NOTE: The Echo Dot is what was used for the creation of this tutorial (the most convenient option, if not also the least expensive). Using a physical device is optional, because skills can be tested without a physical device. However, since the testing environment gives no audio output, we would not be able to hear that music is actually being streamed, without a device.

PART NAME/DESCRIPTION	VENDOR	VENDOR Number or URL	PRICE	QTY
Amazon Echo Dot	Amazon.com	https://www.amazon.com/All-New-Amazon-Echo-Dot-Add-Alexa-To-Any-Room/dp/B01DFKC2SO/ref=sr_1_2?ie=UTF8&qid=1483065428&sr=8-2&keywords=echo	$49.99	1

Construction

Background

The Alexa voice service (AVS), by Amazon, has a developer package called the Alexa SKills Kit (ASK) that allows users to create new skills for Alexa-enabled devices. All Alexa-enabled devices can perform functions with these skills. The voice-recognition software is taken care of when using this, leaving the developer to focus on designing the actual given commands.

Most Alexa skills are made up of a skill created with the Alexa Skills Kit in the AWS Developer Console and an AWS Lambda function that the skill sends its requests to.

Here we'll initiate the two parts and then connect our skill to complete it:

Step 1:

Sign into Amazon Developer Console:

If you do not already have one, create an Amazon Developer Account and sign in to the Amazon Developer Console.

Step 2:

Sign into AWS Management console:

In a separate tab if you do not already have one, create an AWS Account and sign into the AWS Management Console.

Step 3:

Creating an IAM role:

From the AWS Management Console, under the “Security, Identity, and Compliance” category, click “IAM”. Go to “Roles” and create a new role. Name the role whatever you like; the name doesn't matter. Go to the next page.

This role will be given to a Lambda function, so we will select that option. Click through to the next page.

Select “CloudWatchlogsfullaccess” and “DynamoDBFullAccess”. Click to the next page.

Review and create your role.

Step 4:

Creating a Lambda function:

In the upper right-hand corner, confirm that your region is set to “N. Virginia”. The AVS can only work with Lambda in this region.

In the upper left of the window, click “Services”, and under “Compute” select “Lambda”. You should get some sort of introduction page if you're new. Click through to get started and “Create a Lambda Function”.

Now we are prompted to select a blueprint. Click “Configure Triggers” on the left.

Select the Alexa Skills Kit as your trigger. Move onto the next page.

Here you'll be prompted to give a name, description, and runtime. The first two are for your reference. As for the runtime, for this demo we'll use Node.js 4.3.

Scroll down to “Lambda function handler and role”. We will use an existing role, and choose the name of the IAM role we created in the previous step.

We will complete the Lambda function right after creating the Alexa Skill.

Step 5:

Configuring the Alexa Skill:

From the Amazon Developer Console dashboard, click on the “Alexa” tab and then “Get Started” with the Alexa Skills Kit.

Click “Add a new skill”. This will put you into the sequence for configuring your Alexa skill. We will be creating a “custom skill”, so please select that option.

The first section of the Alexa Skill, “Skill Information”, allows you to change the skills name, as well as the invocation name. This is also where you can come back to find the skill's app ID after it is generated.

The skill's name is for your reference and is what shows up in the Alexa app should your skill be published. The invocation name is what will be said aloud to initiate the skill with an Alexa-enabled device. There is a link next to these options that can teach you more about invocation phrases.

Please select the “yes” radio button that confirms that our skill will use audio directives, since we will be playing music. Then click “Next”.

This page allows us to create our interaction model. The top text box allows us to define what intents we would like to use in an intent schema. Intents are requests that can be triggered by verbal commands. A few of our intents are required by the Alexa skills kit because we will be streaming music. Intents are specified in JSON format.

In the extracted folder, inside the “speechAssets” folder, there is a file named “IntentSchema”. Copy and paste its contents into the intent schema text entry field. More information can be found on intents with the provided links next to the text box.

Also in the “speechAssets” folder is a file named “Utterances.txt”. Please copy and paste its contents into the “Sample Utterances” text entry field. The first portion of each line (here it is “PlayAudio”) is the intent that is invoked by the sample phrase written after it.

Click “Next” to save and move to the next page.

Step 6:

The Global Fields page is where we define where our requests will be sent.

We need to complete our lambda function in order to complete this portion. Go into the “Skill Information” section of the Alexa skill and copy the Alexa Skill App ID. Go into the repository downloaded earlier and in the “constants.js” file, paste the Alexa skill App ID into the “appId” value.

Scroll back up to “Lambda function code”. For this step, open the repository that can be downloaded, found in the “Required Items” section above.

With node.js installed, using a command line (the Git shell is what I used on Windows), navigate to the “js” directory within the “skill-sample-nodejs-audio-player” folder.

Use this command:

npm install

It will probably stall for a second, don't worry. Once complete, you will see a “node_modules” folder has been created in the “js” directory.

Select everything within the “js” directory, and compress it into a zip file.

We will now upload the created zip file into the section of our Lambda function called “Lambda function code”. Move onto the next page and hit “Create function”. Once created, you will be given an Amazon Resource Number (ARN) at the top right of the page. Copy this.

In your Alexa Skill configuration page, select ARN as your service endpoint type, and paste your ARN into the text box, being sure to check the North America box. Move on to the next page and now everything is done!

It's time to begin testing…maybe troubleshooting?

Programming

We got it working, so, how exactly does this thing work?

Well, we'll go through a quick overview here, and I'll link relevant content as I go.

1. The Alexa Skill

The skill built through the developer console has no actual programming involved.

What we do define is what the end user will say to invoke our skill, or navigate options within the skill.

The intent schema and sample utterances combined represent things the user might say, and what to do in response.

The intent schema is where we set up which intents we will use. Intents are a way of specifying the request sent by the skill to our function in Lambda.

In our example, we use a custom intent called “PlayAudio”. In the sample utterances section, we wrote possible things the end user will say that we want to invoke the “PlayAudio” intent. You will notice that there are other intents in our intent schema. These are built-in intents and we do not have to define the possible utterances for that Alexa will understand.

For more on defining an Alexa skill's voice interface, please visit this link.

We must send our request somewhere, and the global fields section of our skill allows us to specify whether we are sending requests to our own URL or to an AWS Lambda function. It is easier to use the Amazon Resource Number (ARN) of a Lambda function, because we do not have to deal with any specifics concerning how our requests are sent to the Lambda function, we only need to write the code that our function will use to handle requests.

2. The Lambda Function

The Lambda function is where our code is hosted and responds when triggered by the Alexa skill. Otherwise, the function will sit idle and do nothing, making it very efficient for our purposes. With that said, let's take a brief look at our code below.

'use strict';
 
var Alexa = require('alexa-sdk'); //import Alexa SDK 
var constants = require('./constants'); //will need values from constants.js
var stateHandlers = require('./stateHandlers'); //need to register statehandlers
var audioEventHandlers = require('./audioEventHandlers');//must register audio event handlers
 
exports.handler = function(event, context, callback){
    var alexa = Alexa.handler(event, context);
    alexa.appId = constants.appId;
    alexa.dynamoDBTableName = constants.dynamoDBTableName; //table name
    alexa.registerHandlers( //this function allows us to register our handlers
        stateHandlers.startModeIntentHandlers,//we can register multiple at a time
        stateHandlers.playModeIntentHandlers,
        stateHandlers.remoteControllerHandlers,
        stateHandlers.resumeDecisionModeIntentHandlers,
        audioEventHandlers //there are multiple in this file
    );
 
    alexa.execute(); //just makes it all happen
 
};

If you recall in the Lambda function, we use “index.handler” when we defined our handler in the configuration page. That is how our lambda function accesses the handlers that are used to process the requests. A handler looks for a specific intent and when it is invoked, runs the actions we program. Inside of the files imported into index.js, we can find the actual handlers.

For more on handling requests, please visit this link: Handling Requests

3. IAM and the DynamoDB Table

In our example, we don't explicitly setup an Amazon DynamoDB table, but rather it is set up by our Lambda function. However, when we created a role with the Identity and Access Management service (IAM) and set it as our Lambda function's role, we were giving our function permission to create that table for us, as well as access to our Amazon Cloudwatch service.

Amazon CloudWatch is a service that allows us to track metrics of our Amazon Web Services (AWS) account. This ability isn't explicitly used by the skill, but there may be overlap in permissions from CloudWatch that let the function create a table in DynamoDB.

NOTE: If you intend to repurpose this sample and change what is actually played by the skill, you will need to go into DynamoDB and delete the table created by the function, as it holds details about what the user has played, but may not replace them correctly with new audio sources.

Final Words

This walkthrough has gotten us off the ground with a music-playing Alexa skill, using built-in intents, Amazon DynamoDB, and Amazon IAM. For more on how skills work, please review other links here for an introduction to developing for Alexa.

For more information on this particular sample, please go through the README.md file.

For questions, clarifications, etc, Email: ricoys1@unlv.nevada.edu

DASL Wiki

Table of Contents