The Speech Transmission Index is built on over 100 years of acoustic and sound engineering research combined with a clever twist on classifying optical systems. It involves complex maths and complicated theories, some of which are not fully formed, whilst others are coming under closer scrutiny.

But fear not, you don’t need to be a certified mathematician, as STI (or the STI PA measurement) has condensed all that into a single 15-second test. A test you can do with a handheld meter, even your smartphone! That’s a triumph considering the full STI PA needs 98 ten-second tests. We’re talking 16 minutes to get just one result in one location.

Now to be serious for a moment, the mystery of STI conceals booby traps that lay ready to blow big holes in your profits. So let’s dilly-dally no more…

The mystery starts with what STI stands for

Speech Transmission Index rates the amount a signal changes from the start to the end of its journey. It is a speech intelligibility measurement that predicts how the characteristics of the channel (the room, for example) affect speech intelligibility.

We’re interested in speech, not music, so the test signal has the main elements of human speech.

The STI PA (Speech Transmission Index for Public Address systems) test signal is a version of the STI. It is based on male speech. Why? Because female speech is said to be more intelligible so using the male signal means it can only improve.

The test signal needs to take the same journey a real announcement would take: that’s from the announcer’s lips to the listener’s ear.

Three important facts:

  1. The journey should start at the announcer’s lips, although it can be an input to the equipment.
  2. It doesn’t care about the announcer’s speaking skills or the listeners’ hearing acuity.
  3. The test assumes the listener has one ear – so actual intelligibility should be better.

The test compares the original test signal to that in the listening area. Index means rating.

STI PA is a way to get an STI result, so don’t make my mistake by saying 0.6 STIPA; the correct format is 0.60 STI.

What does the Speech Transmission Index measure?

I’m using animal names for this. No reason except my childish humour. I’m imagining you mouthing ‘baboon’, ‘hippopotamus’, ‘gazelle’ and ‘lion’ and those around you thinking you’ve finally cracked.

First baboon…

Speech Transmission Index for Public Address systems using ba-boo-n

When you say it, some parts are louder than others.

‘Ba’ starts loud, then tails off, ‘boo’ does much the same, with the quieter ‘n’ to finish.

Here’s a graph of ‘Ba’, ‘boo’, and ‘n’ based on their loudness, officially called intensity.

Speech Transmission Index

‘Ba’ and ‘Boo’ start similar but sound different. Those differences are for two reasons:

  1. Contained in that shape are faster peaks and troughs, and these are different for ‘ba’ and ‘boo’, and
  2. ‘boo’ takes longer to say than ‘ba’ – duh.

The proper name for that loudness shape is the intensity envelope, and the proper name for those peaks and troughs is modulations.

Those faster modulations are not even the same speed, so STI groups them by their speed.

Which brings me to hippopotamus…

speech intelligibility index

Say hippopotamus.

To say ‘hippopotamus’, your lips and tongue move more and at a faster rate compared to ‘baboon’: it has more faster modulations.

Imagine visiting an old church with a mate. You tell your mate you will say two words to them. You walk ten metres away, turn and first say, “Hippopotamus”, then a few seconds later, “Baboon.”

Once your mate has stopped laughing, they’ll tell you “hippopotamus” was harder to understand than “baboon”. That’s because an old church will be echoey and faster modulations don’t like echoey. I know the correct word is reverberation.

Reverberation Is Like Snow

speech intelligibility

Stay with me; it’s not as mad as it seems.

Picture a place you know well, your garden, street or outside your office.

On a summer’s day, everything is visible, from tiny ants running around to large structures like cars, garden tables and sheds.

Now when it snows, detail is lost. With enough snow, kerbs and steps get hidden. Large structures, like cars, are still visible, but the windscreen wipers have been covered and other sharp details smoothed over.

The bigger structures (cars and sheds) are like lower-pitched sounds, and smaller things (ants, windscreen wipers) are like higher-pitched sounds. As the snow builds up so more and more detail disappears, and with major snowfall, cars are buried.

The amount of snow is like the reverberation time, and the size of the structure is like the pitch.

A little snow covers the finer details the equivalent of shorter reverberation times affecting the higher-pitched sounds.

Heavier snowfall buries more detail, the same as longer reverberation times bury lower-pitched sounds.

STI’s simple but clever bit…

Once STI has grouped the word parts by their modulation speed, it then works out how those groups differ from the original perfect signal.

If the changes are only in the fastest modulation groups, then the reverberation time must be quite short, and the STI number will be better.

With longer reverberation, the groups with slower modulations get affected, lowering the STI number.

So STI measures the effect reverberation has on speech?

Yes, it does, but STI deals with noise too.

Noise is like a lake

The beautiful place below has snow sitting on mountain tops, smoothing out the detail (like reverberation). The lake covers all the low-lying land; for all we know, submerged in the lake is an ancient city.

STI - Smoothing out the detail (like reverberation)

Importantly, above the waterline, the lake has no impact on the details.

So whilst reverberation affects the frequency or pitch of the speech part, noise submerges all speech parts that are quieter no matter their frequency. Sound geeks call this the noise floor.

How STI calculates the damage noise causes intelligibility

STI measures the average sound level like a dB meter would. (Sorry, I should say SPL meter.)

Without noise, the average sound level would include all the quiet bits. Flood the place with noise, and those quieter bits get submerged. The average sound level is now raised, telling STI the intelligibility won’t be so good.

STI PA summed up with orange snow, a baboon and noise

We’ve covered how the bits of this mystery work separately, so here’s what they do together and how STI handles it all.

First, that ‘Baboon’ graph, but with reverberation added.

Speech Intelligibility Index

The dotted blue line is the original graph you saw earlier. The orange looks a bit like snow sitting on top, don’t you think? Yes, all right, orange snow.

With reverberation, the peaks are louder, which can be quite useful. The bad news is that those earlier sounds linger, so the troughs are not so deep.

Now we’ll remove reverberation and replace it with noise.

Speech Transmission Index (STI) - Reverberation baboon graph

Just as the lake covers everything low down, so the quieter bits of ‘baboon’ are lost in the noise.

Put them together, and we get this…

Speech Intelligibility Index (STI pa) reverberation and noise graph

You’ll notice that the noise is also louder. Well, if reverberation makes the speech louder, it’s going to do the same for noise.

Speech Transmission Index in two sentences – well, almost

STI measures the depth between the peaks and troughs (speech modulations). The less that depth difference the lower will be your STI number.

In the maths world, they call such comparisons a transfer function.

That gives us the term Modulation Transfer Function (or speech journey comparison) and is the basis of the Speech Transmission Index.

Here’s the missing part: it’s important

It’s called the Upward Spread of Masking.

I wrote the following sentence, and it confused me, but still can’t find a better way of putting it – sorry…

“A lower-pitched part of speech makes it harder to understand a higher-pitched part of speech that follows.”

None the wiser? Me too, and I wrote it.

BS EN 60268-16:2020 (the STI PA standard, no less) puts it this way…

 “When a loud, low-frequency sound is presented at the ear, it always masks higher frequencies, possibly rendering them inaudible if the difference between their relative levels exceeds a given threshold.”

More accurate than mine, but still confusing.

Time for a couple more animal names, methinks.

First Gazelle…

Speech Intelligibility Index for Public Address Systems Gazelle Picture

‘Ga’ is pitched lower than ‘zelle’; therefore ‘Ga’ makes ‘zelle’ harder to understand, and the louder you say gazelle, the worse this problem gets.

Importantly, this is caused by the way our ears work. This means shouting louder or turning up the PAVA system volume level won’t help with intelligibility or your STI number.

Now Lion…

BS EN 60268-16:2020 (the STI PA standard)

‘Li’ is pitched higher than ‘on’, so it does not suffer from this problem.

That was all easier with animal names.

This masking effect probably explains why loudhailers are useless.

That’s the end of the beginning.

You now know the important things about STI PA. Next up is an article to save 35% savings on your PAVA quotation.

Leave a Reply

Your email address will not be published. Required fields are marked *