Creating Digital Audio with ALSA

Sound in Linux is a complicated beast. There’s a stack of software (including things like Jack and Pulse Audio) that is responsible for receiving and mixing all the different types of audio streams and playing them through your speakers. On the bottom of this stack, interfacing your sound card, is ALSA.

ALSA stands for Advanced Linux Sound Architecture. It is a Linux device driver for interacting with your computer’s sound card, but this definition does not fully do it justice. ALSA has a raft of features and capabilities, including the way it provides virtualized soundcards for dealing with Linux’s legacy sound system, OSS. ALSA also provides a very rich development API that you can use to control your sound card.

Creating your own custom digital sounds, and using ALSA to play them, is the topic of this post.

To follow along, all you’ll need is some exposure to C programming, your own Installation of Linux and a little bit of patience. I’m not assuming any prior knowledge of digital audio.

Digital Sound Background

Sound is how we humans perceive a time varying wave of air pressure called sound waves. The amount the air pressure varies is the amplitude of the sound wave and is perceived as the volume of the sound we hear. The quickness with which the pressure varies is the frequency of the wave and is what we perceive as the sound’s pitch. Consider the open A string on a violin with a pitch colloquially known as A 440. It has this name because the wave oscillates 440 times per second, and so we say it has a frequency of 440 Hz. The figure below shows how the sound wave for this note would vary with time.

440 Hz sound wave
440 Hz sound wave

Mathematically, this wave can be represented by the equation below,

f(t) = sin(2 π 440 t)

A key observation here is how the wave varies smoothly with time. It’s as if the wave was infinitesimally divisible in both time and amplitude (signals processing folks would call this a continuous time signal.) This could also be referred to as an analog wave. This post, however, deals with digital audio, not analog, and the thing about digital audio is: you can’t get this infinitesimal division. The good news is that this limitation doesn’t prevent us from approximating the analog wave pretty well. We do this by sampling the wave.

Think of sampling like taking a collection of snap shots of the analog waves value at discrete, evenly spaced time periods. The time instance  and the value of the wave at that time instance together are referred to as 1 sample. We can sample our analog wave many times each second to produce a digital approximation of the wave. The sequence of samples would be referred to by the signals processing community as a discrete time signal.

For reference,  the Nyquist-Shannon sampling theorem says that  you should always choose a sampling rate that is at least twice that of the highest frequency you expect to be able to reproduce. Humans can’t really hear pitches above approximately 20,000 Hz, or 20 kHz, so CD audio uses a sampling rate to 44.1 kHz. Since human’s don’t really talk in a pitch much higher than around 3 kHz, our voices are sampled at a rate of 8 kHz when we talk on the phone, or use walkie talkies.

Let’s say, for example, that I sampled the wave in the figure above 4400 times each second. This would give me 10 samples for each period. The frequency which with I sample my wave is called the sampling rate, and so in this case my sampling rate is 4400 Hz. The figure below shows the samples I would receive (in pink) from the sound wave (in blue).

Sound Wave, 440Hz, sampled
Sound Wave, 440 Hz, sampled

Consider the mathematics behind sampling the wave:

Each sampling instance, n, is an instance of time, t, where t is only defined in multiples of the sampling period τ, which is the reciprocal of the sampling rate, fs.

τ = 1 / fs

n = T t,    T = 0 τ, 1τ, 2 τ, …

If we substitute this back into our equation for the analog sound wave, we get this. Note, I’m using square brackets to denote a discrete time signal, versus the round brackets for a continuous time signal.

f[n] = f(T t) = sin(2 π f T t)

 = sin(2 π f n)

If our sampling rate is 4400 Hz, then our τ = 1/4400. Also remember the frequency of our continuous time signal is 440 Hz. Substituting these values into our equation we get

f[n] =  sin(2 π (440 / 4400) n)

=  sin(2 π 1/10 n) 

This is the equation for our discrete time signal. Note that it’s discrete time frequency is 1/10, which means it goes through 1 period of oscillation every 10 samples.

Now the trick becomes, ‘how do I create these samples?‘ and when I have them, ‘how do I play them?‘.

It turns out, creating the samples for a simple 440 Hz sine wave is pretty easy to do. Check out the program below:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <limits.h>

#define SAMPLING_RATE 4400
#define NOTE_LENGTH 1.0
#define PI 3.14159265359

int main(){
  // Calculate how many samples we'll produce
  int num_samples = SAMPLING_RATE * NOTE_LENGTH;

  // Allocate memory to store our samples
  int * samples = malloc(num_samples * sizeof(int));
  if(samples == 0){
    printf("Could not allocate memory for samples");
   return;
  }

  // Create the samples
  int n;
  double tmp_sample;
  double note_frequency = 440.0;
  for(n=0; n<num_samples; n++){

    // Calculate the sample value (in floating point)
    tmp_sample = sin(2.0 * PI * note_frequency/SAMPLING_RATE * n);

    // Convert back to integer
    samples[n] = INT_MAX * tmp_sample;
  }
  // Do something useful with samples
  // ...
}

The majority of the rest of this post is going to focus on getting these samples to play through the speakers, using ALSA

ALSA Basics

ALSA comes with an API that you can use to configure and control your sound card. To use the API though as well as to understand the API documentation, you’ll need to know the vernacular.

ALSA keeps track of a hardware buffer on your sound card. This buffer contains samples that your sound card plays at the configured sampling rate.

Your sound card can have more than 1 channel. Most commonly, your sound card has two channels, left and right (for stereo sound). In ALSA, a frame is a set of samples (1 per each channel) for a single sampling instance. The way frames are stored within the buffer is configurable, but generally you will interleave them (alternate left sample, right sample, etc.)

ALSA transfers sets of frames to the hardware buffer in discrete length chunks called periods. The length of a period is configurable, both in time and number of sampling instances (these 2 quantities are related by your sampling rate).

Consider the example below. We’re going to use a signed 16 bit integer to represent our sample (just like CD audio). We have a left sample and a right sample and this creates a frame. A set of frames becomes a period and ALSA transfers the period of samples to the end of a hardware buffer.

Organization of samples to be passed
Organization of samples to be passed

First ALSA Program

Here, I’ll try to create the Hello World of ALSA programs. The goal is to produce on second worth of a 440 Hz sound wave.  I’ll show the source code below. I’ve tried to add lots of helpful comments to make it a little more obvious what is going on.

#define ALSA_PCM_NEW_HW_PARAMS_API
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#include <alsa/asoundlib.h>

int main(){
  int sampling_rate = 44100, dir = 0, rc, i, j = 0;
  char * buffer;
  double x, y, freq = 440.0, seconds = 2;
  int sample, amp = 10000;

  // Here are some alsa specific structures
  snd_pcm_t * handle; // A reference to the sound card
  snd_pcm_hw_params_t * params; // Information about hardware params
  snd_pcm_uframes_t frames = 4; // The size of the period

  // Here we open a reference to the sound card
  rc = snd_pcm_open(&handle, "default", SND_PCM_STREAM_PLAYBACK, 0);
  if(rc < 0){
    fprintf(stderr, 
      "unable to open defualt device: %s\n", snd_strerror(rc));
    exit(1);
  }

  // Now we allocate memory for the parameters structure on the stack
  snd_pcm_hw_params_alloca(&params);

  // Next, we're going to set all the parameters for the sound card

  // This sets up the soundcard with some default parameters and we'll 
  // customize it a bit afterwards
  snd_pcm_hw_params_any(handle, params);

  // Set the samples for each channel to be interleaved
  snd_pcm_hw_params_set_access(handle, params, 
    SND_PCM_ACCESS_RW_INTERLEAVED);

  // This says our samples represented as signed, 16 bit integers in
  // little endian format
  snd_pcm_hw_params_set_format(handle, params, SND_PCM_FORMAT_S16_LE);

  // We use 2 channels (left audio and right audio)
  snd_pcm_hw_params_set_channels(handle, params, 2);

  // Here we set our sampling rate. 
  snd_pcm_hw_params_set_rate_near(handle, params, &sampling_rate, &dir);

  // This sets the period size
  snd_pcm_hw_params_set_period_size_near(handle, params, &frames, &dir);

  // Finally, the parameters get written to the sound card
  rc = snd_pcm_hw_params(handle, params);
  if(rc < 0){
    fprintf(stderr, "unable to set the hw params: %s\n",snd_strerror(rc));
    exit(1);
  }

  // This allocates memory to hold our samples
  buffer = (char *) malloc(frames * 4);
  j = 0;
  for (i=0;i< seconds * sampling_rate; i++){
    // Create a sample and convert it back to an integer
    x = (double) i / (double) sampling_rate;
    y = sin(2.0 * 3.14159 * freq * x);
    sample = amp * y;

    // Store the sample in our buffer using Little Endian format
    buffer[0 + 4*j] = sample & 0xff;
    buffer[1 + 4*j] = (sample & 0xff00) >> 8;
    buffer[2 + 4*j] = sample & 0xff;
    buffer[3 + 4*j] = (sample & 0xff00) >> 8;

    // If we have a buffer full of samples, write 1 period of 
    //samples to the sound card
    if(j++ == frames){
      j = snd_pcm_writei(handle, buffer, frames);

      // Check for under runs
      if (j < 0){
        snd_pcm_prepare(handle);
      }
      j = 0;
    }
  }

  // Play all remaining samples before exitting
  snd_pcm_drain(handle);

  // Close the sound card handle
  snd_pcm_close(handle);  return 0;
}

What’s Next?

Check our the ALSA API documentation for a more in depth look at the functions you’ve got available. Try creating more interesting sounds by changing the frequency of the wave as you produce samples (a technique called frequency modulation), or try implementing a Karplus-Strong filter as an easy way to get a half decent plucked string sound.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s