Zero to Hero, Part 1: Alexa Skills Kit Overview

As an Amazon Associate I earn from qualifying purchases.

Woodworking Plans Banner

– Hello everyone, my name is Andrea, and I work as a Technology Evangelist in the Amazon Alexa team. Today I am pleased to present you a series that is about Developing Alexa skills from scratch. It is divided into several videos. Each video deals with a single module. If you go through all the modules, you will learn to create a skill from A to Z. In the first video you will learn the basics. The terminology and a “Hello World” skill are the beginning of the process of creating a skill. Here we go! Let's get to work right away. Before programming, let's look at the basics. What is Alexa? Alexa offers various options for developers and device manufacturers. For example the Alexa Voice Service, This enables device manufacturers to integrate Alexa into their products. For this you need a WLAN chip and a microphone. Every device can do this Become Alexa-enabled. So not only Echo devices can use Alexa. The other side of Alexa is the Alexa Skills Kit.

This is about the creation of content, which are then available to users: the skills. Skills are skills that developers teach Alexa to so that she can do different things. How do you activate a skill? To activate a skill, first start the device with the activation word Alexa. After activating the device, open the skill. You can use different words to do this: “Open”, “Start” or “Begin”. In this case we open a skill called “Space Facts”. So: “Alexa, open Space Facts”. By the way, “Space Facts” is the invocation name. The invocation name is a name you choose, that you give to your skill during development. Users have to tell him to open the skill. It's okay if it's multiple words these can be general. A lot of people ask me whether invocation names need to be unique. The answer is no. If several skills have the same invocation name, it is up to the user to choose who he is want to use. Maybe you don't just want to open the skill but also directly access a specific function of the skill.

For example you say "Alexa, ask Space Facts about a fact." the utterance, users can say when they open the skill or when the skill greets them and asks what they want to do. These utterances correspond to the intents. There are several ways to say the same thing: but all stand for the same intent There are several ways to say the same thing: and that is the basis of programming for Alexa. You don't have to worry about specific sentences, but only about intent. As a skill developer, you decide which intents support your skills. For example, share a fact or plan a trip. Or stop or start a new game. What ever. And you give some example sentences which are exemplary for this intent. And then it's up to Alexa to direct the user to the to bring the right intent. As a developer, you always work with intents. In this case we have a “GetNewFactIntent”, which can trigger all imaginable different phrases, like "tell me a fact" or "give me a fact" or just "a fact".

That is not always enough. Sometimes it is not enough for you as a developer only want to use the desired intent of your choice, because for some intents must be the user provide additional information. For example in this sentence: "Alexa, ask Space Facts about a fact about Mars." I am asking here about a specific fact about Mars. And that's called a slot. With Alexa, this corresponds to a linguistic variable. And a variable is part of the user's utterance, that you as a developer need to to fulfill the concrete intention. Your intents can have more than one slot. Users can have more than one slot in their sets and specify in their utterances. In this case we would like to have a slot called Planet ”, because we have facts about one want to communicate to specific planets.

There are built-in slots that Amazon has trained on different models, like animals, names and cities. You can set user-defined slots. In a nutshell: we have Alexa, this is the activation word. The call can be, for example, "Question", "Open" or "Start". The invocation name that you set for your skill and the utterance. The utterance is what the user utters and is assigned to a specific intent. If your intent needs a slot, the utterance takes the slot and sends it together with the request, a JSON request. We'll see that later. You can define an infinite number of intents, but Amazon provides predefined intents, some of which are mandatory if you want to successfully pass the certification. Predefined intents are, for example, “Help”, “End” and “Stop”, but there are also predefined intents such as “Repeat” to submit an address.

The integrated intents are therefore in advance Trained for many utterances, that will help you as a developer for very frequent utterances like “Yes”, “No”, “Repeat”, “Random”, “Play” and so on not having to enter every possible sentence. So "Amazon.CancelIntent" corresponds the command “Cancel”. "Amazon.HelpIntent" is equivalent to "I need help." “Stop” means stop, etc. Before we get into programming, Let's summarize briefly: Developing for Alexa is a two-step process. On the one hand, you develop the Voice Interaction Model. In the voice interaction model do you define the name of the skill, the invocation name, the intents, the slots, the utterances, so basically everything has to do with the voice interface. On the other hand, you implement and manage in the backend all of these intents.

Since we recently introduced this feature, we use the Alexa-hosted skills in the example, because all you need is a website: developer.amazon.com. We don't need a separate backend. You may need that later if you already know your way around. You may then want to create a custom AWS endpoint. For now we will use Alexa-hosted skills, since everything is hosted here for us and is administered and we can start right away. In our first few videos we won't go into detail with each other deal with the folder structure of our skills. We'll take a closer look at this later, if we use the Amazon Command Line Interface (CLI) use. Just this much: The folder structure is structured as follows: We have a model folder.

There are all of them our interaction models, intents and utterances in a JSON file. And for every language that we want to support. And then we have a lambda folder, where our backend code is hosted. If you are using the CLI, all of these folders will be created together with a skill.json manifest. You can find the versions of your skill in your directory control from the command line, manage and distribute.

But now we're going to use the browser the interaction model and code stores separately in two places. We can start programming in a moment, but first I'll give you an overview of the Structure of Alexa. Everything starts with the utterance of the user For example: "Alexa, open Space Facts" This sentence pronounced by the user is only noise for a device. So the device must have a chip who recognizes that the user has just said "Alexa", wake up the device and display the information send the Alexa service in the cloud. Once the information sent to the cloud, these must be converted into text. This is called automatic speech recognition. Now we have to find out what the sentence means. What does the user want to do? In our case, he'd like to know something, right? This is called natural language understanding. Now let's talk about the text and the meaning of what the user wants to do, to the actual intent. So when I say, "Alexa, open Space Facts", Alexa will understand that I "Alexa, open Space Facts "said.

Alexa recognizes that I have a skill called “Space Facts” and Ask “Space Facts” about a fact. It's almost a tongue twister. In addition, Alexa also understands that I “GetNewFactIntent” from “Space Facts” triggere and send to the skill backend, so that the instruction is fulfilled. The process goes like this: Utterance goes to Alexa, Alexa recognizes that the user wants to use a skill. The skill can decide how he reacts. It can be text or audio. The answer is sent back to the user. There are also pictures, which we will look at in one of the next videos. Let's get started! We are in the console developer.amazon.com.

And we will create a new intent. To do this, I click on “Create Skill”. You have different options here. you can choose different types of skills and choose one. Host it in your own backend or at Alexa. I definitely take Alexa-hosted. We can't deal with all kinds of skills right now busy, but to give you an idea: Custom is the most common because because that way we can define our own intentions. So I'll go for Custom. And of course we call it "Hello World". To do this, I click on "Create Skill" Alexa is now providing all resources. So, by choosing an Alexa-hosted skill, we have the interaction model and the backend are available directly in the console.

You will see that there is some: The “HelloWorldIntent” and the four integrated intents: “Cancel”, “Help”, “Stop” and “Navigate Home”. We also have a code tab, where the placeholder code of "Hello World" is located. To guide you through the console here: We have different phases. We have the tabs for build, code, test, distribution, Certification and Analytics. They symbolize every phase of skill development. Then we have our invocation name here, for which we have defined "Hello World". And then we have our intents. Each intent can also have slots, which we will cover in more detail in a future video. And then we have the predefined intents, which we can expand with our own values. So let's go ahead and build our model. To do this, simply click on "Build Model" above. Now all of this information sent to Alexa, which processes it and the Knowledge engine added. While the whole thing is being built I will explain the JSON editor to you, which is essentially the JSON representation of everything what we have done so far in this interface.

In this case we haven't done much because we have the boilerplate template as a Use the starting point, but in future videos let's change some things that can then be seen here. So the model is being trained and now everything is done. The full build is successful. Now let's take a look what our code looks like. I'll zoom in on something so you can see it better. Here we have the Lambda folder with our “index.js”. This is our starting point. And here we have the Alexa SDK and a handler for each “main intent”. Here the handlers are still very simple. There is a “canHandle method” and a “handle method”. "CanHandle" determines if this is the right handler for the incoming request is. And “act” determines what needs to be done as soon as the request is received. So in this case, if it is a “LaunchRequest”, we said "Alexa, open Hello World". Answer: "Welcome, you can say hello or help." "What do you want to try?" We send this text back with the “responseBuilder” and add "speak method" to the text as an argument.

Then we can add a request which keeps the session open and waits for user input. You will find that this is the same for different intents, z. B. the requests "Hello World" "Help", "Cancel" and the “SessionEndedRequest”, which is activated when we say, for example, "Quit". We're getting a few more handlers for debugging. But we'll cover these in another video. Now you can test. We grant microphone access so that we can properly use the Simulator can speak. Likewise the developer mode for this skill. We have two options: either we type "Open Hello World".

– [Simulator] "Welcome, you can say hello or help." "What do you want to try?" Or we talk to the simulator. – [Andrea] "Hello." – [Simulator] "Hello World!" And the hello world intent is triggered. Before we get to the end of this video, I want to show you the most important input / output skills that you see when you use this simulator. This is the JSON input. Alexa sends this to our backend. You can see that there are various useful parameters like the "userId", which is a randomly generated string acts for every user and every intent.

This gives an overview of who comes back and who doesn't. And then we have our main section So the one with the actual request. In this case an intent request of the type “Hello World”. And finally we have the JSON output, So what our skill returns. And in our case we send "Hello World" back, because our intent, the "Hello World-Intent", always returns "Hello World" when triggered. Our skill doesn't do much yet, but in the further videos we will gradually Add functions. Before I say goodbye, some key resources useful at the beginning of your skill building journey. Thank you for watching and see you next time..

As an Amazon Associate I earn from qualifying purchases.

You May Also Like

About the Author: tech