31 Days of Mango | Day #27: Microphone API

Day27-MicrophoneAPIDay26-BackgroundFileTransfersDay25-BackgroundAgentsDay24-ProfilerToolDay23-ExecutionModel

This article is Day #27 in a series called 31 Days of Mango, and was written by guest author Parag Joshi. Parag can be reached on Twitter at @ilovethexbox.

Introduction

Speech recognition or taking notes is a very common practice depending on your profession or usage. For e.g. it is understood that today’s mobile devices offer some kind of speech recognition and application control, whether it is directions or dialing a number or the other most common usage, playing music.

Note taking is another feature that is useful in recording findings in the field for later reference.

The very first step to all of these applications is to be able to capture the audio via the microphone and process it as needed.

So today we will look at the microphone API provided in Mango and we will put together a very simple sample to get you started.

What is involved?

At the very minimum, we need to record the audio and save it for playback. This gives the user the very basic ability to take notes.

The following key points should be noted:

  1. Microphone class: This is the class provided by the Microsoft.Xna.Framework.Audio namespace that allows us access to the microphone api.
  2. public event EventHandler<EventArgs> BufferReady: This is an event provided when the microphone is ready to release the buffered audio. We need to handle this event and store the audio for playback.
  3. Microphone.Start: As the name suggests, we call this to start recording.
  4. Microphone.Stop: We call this to stop recording. One key point to note here is that calling Microphone.Stop immediately clears out the buffer.

As you will see in the application, we don’t call stop immediately when the user toggles the microphone or clicks the play button. Instead we let the microphone raise the buffer ready event at the selected buffer duration to capture the last bit of audio data before stopping recording.

using Microsoft.Xna.Framework-> As you must have guessed from the microphone namespace, we require a reference to the XNA framework. The microphone API is part of the XNA framework and requires simulating the XNA game loop. If you are not familiar with XNA, XNA is a rich framework provided by Microsoft for game and graphics based applications.

Understanding the sample

Prerequisites: Install the Mango tools from http://create.msdn.com. This should give you Visual Studio Express 2010 and the Windows Phone SDK that you need to develop applications for Windows Phone.

1. Launch Visual Studio and browse to the solution file and open it. The application is built using the Silverlight "Windows Phone Application" template. Run the project and deploy to the emulator. You will see the following screen when the application finishes loading:

clip_image002

2. Screen element:

a. The microphone button is a toggle which starts and stops the microphone.

b. The play button is used to playback the recorded sound.

c. There are three slider controls to adjust the volume, pitch and pan of the sound being played back. These properties can be adjusted only before starting playback.

3. How it works:

a. Touch the microphone to start recording. You can stop recording by touching the microphone again or alternatively touching the play button.

b. Adjust the volume , pitch and pan one by one and test the effect of the change by playing the recorded sound.

Understanding the code

Declarations: Here is a screenshot of the declarations.

clip_image004

We will be using an object of the SoundEffectInstance class to playback the recorded audio. We could also have used a SoundEffect however using the SoundEffectInstance class allows us to track the state (playing or stopped).

The other declaration here is for a MemoryStream object. The microphone buffer is constantly written to a memorystream until playback is desired. At that time,we submit the contents of the memorystream to the SoundEffectInstance object to play.

Initialization:

clip_image006

The key point to note here is the game loop we have created using the DispatcherTimer. This loop is essential to capturing the audio from the microphone.

We set the image for the play icon based on the light or dark theme used in the phone.

At this point we also set up the microphone defaults for our application as follows:

clip_image008

The buffer duration is set to 1/2 second and then we use a method GetSampleSizeInBytes and pass the buffer duration to get the right buffer size. This is important to ensure smooth audio capture.

We wire up the buffer ready event and set the default stopped microphone image.

We are ready to start recording!

Recording Audio:

When the user clicks the microphone button the following code is called:

clip_image010

There are a few things happening here:

1. Microphone is stopped: If the microphone is stopped we need to start it to begin recording. We set the background of the microphone button. Then we reset to MemoryStream to clear out previously recorded audio. We check and stop the recorded sound from playing.

Fairly straightforward steps. At a minimum we need to call Microphone.Start(). The rest of the steps are based on the UI and application design.

2. Microphone is recording: At a minimum we have to stop recording. If you recall the note from above, we cannot call Microphone.Stop immediately as all recorded data has not been flushed to the MemoryStream. So we use boolean varibles to keep track and defer the stop action to DispatchTimer event.

Two things need to happen here. First, we need to let the microphone bufer ready event fire first so we can read the last bit of audio and then we need to trigger playback. We handle it as follows:

clip_image012

In the buffer ready event, we check if recording has been stopped using our boolean variables and then we call Stream.Flush(). This flushes the remaining data to the MemoryStream. Then we stop the microphone.

However we cannot trigger playback in this event. That is handled by the DispatchTimer tick event as follows:

clip_image014

The tick event is called every 33ms. So it will trigger fairly close to the user selecting playback with relatively no indication of a lag. The advantage ofcourse is that we can playback the entire audio rather than letting it get cut off.

We check if it is time to play the recording and whether the stream has been flushed. If so we start a new thread for audio playback. This is an important point to note here.

We are triggering audio playback on a different thread to allow the UI to update. This means any code inside our playback routine that chooses to update UI elements has to do it by calling Dispatch.BeginInvoke as shown below.

Playback:

clip_image016

We create an object of the SoundEffectInstance class by feeding it the captured audio stream, the sample rate of the microphone and the audio channel.

Since we wish to use the sliders to adjust volume, pitch and pan we have to use Dispatcher.BeginInvoke as they are on a different thread.

Finally we call Play.

Summary

So, its fairly simple to create an application to record audio. We can extend this application to save the recorded audio to isolated storage and give it a title chosen by the user. We can add a listview of the recordings from isolated storage and turn this into a note taking application.

The basic steps to record audio are

a) Wire an event to the default microphone to capture the audio.

b) Write the audio to a stream.

c) When the user stops recording, flush the stream and store it or playback.

We can also extend this sample by taking the recorded audio and sending it to a speech translation service and capture commands.

To download a full Windows Phone application that uses all of the code and examples from above, click the Download Code button below.

download

Tomorrow, Jeff Fansler will be covering the Media Library class, and how we can use it to learn more about a user’s music library on their phone.  See you then!

toolsbutton

13 thoughts on “31 Days of Mango | Day #27: Microphone API

  1. Seems somthing is wrong. I only see article header.

  2. I was waiting for the article …😦

  3. Nice job!

    At the beginning of the article, you mentioned the Speech API. Is that available to possibly transcribe speech to text?

    I’d also be interested in some sort of “graphic” during recording or playback to give the user a visual clue that something is happening. I’ve seen this in apps and not sure how to wire up stream to a chart control.

  4. Pingback: 31 Days of Mango | Day #28: Media Library « Blankenblog

  5. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #990

  6. Pingback: 31 Days of Windows Phone Mango | Day # 27: Microphone API

  7. Pingback: 31 Tage Mango | Tag #27: Mikrofon API | leitning.de

  8. Pingback: 31 Days of Mango | Day #29: Globalization « Blankenblog

  9. Pingback: 31 Days of Mango « Blankenblog

  10. Pingback: Día 27: API del micrófono | La Liga Silverlight

  11. Pingback: 31 Days of Windows Phone Mango | Day # 27: Microphone API « Silverlight News

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s