The CSS Speech Module

Styling voice with CSS

You may be thinking that this is an idea for a future module of CSS but actually the work on this CSS specification started already in 2003!

But why have you never heard about it? Because (almost) no browser supported it and the draft was retired 15 years later.

History

When CSS 2.0 was published a part of this release was ACSS  —  the Aural CSS module. But the module was quickly replaced with the speech keyword in CSS 2.1. This specification just reserved the keyword but didn’t specify any of it’s properties or values.

In 2012 the CSS speech module reached CR (Candidate Recommendation) and remained in this state until it was retired in 2018 before any of the widespread browsers implemented it.

But what was the idea behind the module?

The idea

The speech module enables you to style how the elements in your document are spoken. The module contains properties to specify how a document is rendered by a speech synthesizer e.g. volume, voice, speed, pitch, cues, pauses, etc.

The module aimed to assist people who are blind, visually-impaired or otherwise print-disabled by enabling websites to optimize their content aurally. The technology could have been also used for other things like teaching kids how to read.

Properties

Let’s start with some straightforward properties. I will only present some parts of the CSS specification. If you want to learn more about them I recommend you check out the draft!

voice-volume

With the voice-volume property you can control the volume.

Some valid values are silent, x-soft, soft, medium, loud or x-loud. But you can also use decibel if you’d like! The decibels represents the change (positive or negative) relative to the given keyword value (see enumeration above) or to the default value for the element.

h1  {  
  voice-volume: medium 6dB;  
}

voice-balance

With voice-balance you can control the spatial distribution of audio output. You can use a number between -100 and 100 or any of the following values: left, center, right, leftwards, rightwards.

h1  {  
  voice-balance: left;  
}

speak

The speak property determines whether or not to render text aurally. You can either use auto, never or always. It’s important to know that the initial value of speak is auto and if the value is auto it will use the value of the display or visibility-property on the element. So if you set your element to display: none it will also set speak to the value never.

h1  {  
  speak: always;  
}

speak-as

You can use speak-as to determine in what manner text gets rendered aurally, based upon a predefined list of possibilities.

Valid values are normal, spell-out, digits, literal-punctuation or no-punctuation. So you could use e.g. spell-out to spell the text one letter at a time or digits to speak numbers one digit at a time.

h1  {  
  speak-as: spell-out;  
}

The aural formatting model

You can imagine the properties pause, cue and rest as an aural equivalent to padding, border and margin. They surround the styled element.

The CSS formatting model for aural media is based on a sequence of sounds and silences that occur within a nested context similar to the visual box model, which we name the aural “box” model.

The image above visualises the order of pause, cue, and rest as well as their visual box model equivalents.

Voice characteristics

The CSS speech module also provides you properties to modify the voice characteristics. You can change the sound of the voice with voice-family (an equivalent to font-family), change the speed of the spoken text with voice-rate as well as the pitch (voice-pitch), range (voice-range) and stress (voice-stress).

Example

Now that you’ve had a brief introduction to each property of the CSS speech module we can take a look at an example. Below you see a snippet which is provided directly from the draft showing you how an implementation with the module would look like:

h1, h2, h3, h4, h5, h6  {  
  voice-family: paul;  
  voice-stress: moderate;  
  cue-before: url(../audio/ping.wav);  
  voice-volume: medium 6dB;  
}  

p.heidi  {  
  voice-family: female;  
  voice-balance: left;  
  voice-pitch: high;  
  voice-volume: -6dB;  
}  

p.peter  {  
  voice-family: male;  
  voice-balance: right;  
  voice-rate: fast;  
}

span.special  {  
  voice-volume: soft;  
  pause-after: strong;  
}
<h1>I am Paul, and I speak headings.</h1>  
<p class="heidi">Hello, I am Heidi.</p>  
<p class="peter">  
  <span class="special">Can you hear me ?</span>  
  I am Peter.  
</p>

Conclusion

The CSS speech module wasn’t implemented by any of the major browsers after it got the CR status and was retired in 2018 (sadly I couldn’t find the exact reason anywhere, why exactly it was retired).

It could have helped users who are blind or visually impaired by optimizing the content for a better experience. But maybe these users actually find it more disturbing than helpful if you mess around with things like the speed of voice on your website because they are used to their personal settings of the screen reader.

What do you think about the idea and goals of the CSS speech module?