Teaching Vowels Physically – Yes or No?

This article will be a discussion on whether teaching physical tongue/lip/jaw positions for vowels is an effective teaching idea or not. If you want to skip to a certain section, you can click the blue links. You’ll find information in brackets next to quotes, e.g. (Cruttenden, 2014, p.35): this refers to works listed in the bibliography. Enjoy reading!

Alternatively, you can read a more academic/detailed version of this article (which was published in the Voice and Speech Review in 2017) by clicking here.

Introduction & Ultrasound Machines

In August 2015 I attended the International Congress of Phonetic Sciences in Glasgow. The quadrennial conference brings together academics (and others) from across the world to present their research and discuss everything to do with phonetics. Between lectures, you could wander around the stalls in the main conference hall. Many of these displayed various phonetic books you could purchase, but one of them featured ultrasound machines.

“Ultrasound machines!” (I hear you cry) “Aren’t they for pregnant women?”

Well, yes. But they’re also used for speech research. It’s not easy to accurately feel or see what your tongue is doing inside your mouth – let alone give precise measurements. Ultrasound machines make it simpler to determine tongue position.

Here are some pictures of ultrasound machines (the first is more modern):

Image © www.articulateinstruments.com
Image © www.articulateinstruments.com

Here’s an ultrasound video. You can see the tongue moving as the subject speaks (the tip of the tongue is on the left):

Video taken from https://www.youtube.com/watch?v=J7reyZwdZL0

I spent some time playing with these machines because I wanted to clarify something: do vowels have exact tongue, lip, and jaw positions that could and/or should be taught (for English learners, or actors learning a different accent)?

I discussed the issue with a phonetics lecturer at a UK university. The lecturer insisted that there were exact tongue positions for vowels. This is perhaps the view of many English pronunciation textbooks on the market, which give tongue diagrams for each individual vowel sound.

I also spoke to a PhD student who was doing an ultrasound study of tongue positions for vowels. As an undergraduate, she was taught that vowels were made with specific tongue positions. However, while doing research for her PhD she had discovered that different people used different tongue positions for the same vowel.

Confusing… Let’s explore the issue from the beginning…

What are Vowels?

Vowels differ from consonants in that the airflow is not obstructed when you make them. Let’s test this out: make the following consonant sounds [p], [t] and [k].

Can you feel how the air is being stopped and then released? It’s reasonably easy to feel what your lips (for [p]) or your tongue (for [t] and [k]) are doing.

Now make some more consonant sounds: [f], [z], [ʃ].

Can you feel how the air is being obstructed in some way? You can probably feel some friction – and you may be able to feel where your lips (for [f]) or tongue (for [z] and [ʃ]) is in your mouth.

Finally make some vowels: [i], [ɛ], [a].

Can you feel how the air is not obstructed? It’s probably more challenging for you to feel exactly where your tongue is for the vowel sounds.

The Traditional Articulatory Description of Vowels

In the past, phoneticians (=people who study speech, not to be confused with Phoenicians) didn’t have the technical equipment (such as MRI and ultrasound) to look inside their mouths. They also didn’t have the equipment to measure sound in terms of its acoustic properties (such as formants). This meant they had to rely on their own proprioception (=awareness of what oneself is doing) to describe vowel sounds.

Vowels were described in terms of whether the tongue was high/low and front/back (and the lip position). These ideas were further developed by Alexander Melville Bell (1819-1905), who was the father of the inventor of the telephone, Alexander Graham Bell. (You can see Melville Bell’s ideas depicted in the film My Fair Lady.)

In my lessons, I ask students to hold their jaws open and make vowels like [i] and [ɑ] in order to see where the tongue is moving. It’s quite clear that the tongue does move high and front for [i], whereas it is low and back for [ɑ].  These movements are reasonably easy to feel in your mouth and see in a mirror.

Certain positions and gross movements of the tongue can be felt. (Cruttenden, 2014, p.35)

Cardinal Vowels and the Vowel Quadrilateral

Subtler tongue movements for other vowels seem more challenging to see and feel. To help solve this problem, British phonetician Daniel Jones (1881-1967) created his system of Cardinal Vowels (abbreviated to CVs) in 1917. These could be used as reference points to describe the position of any vowel sound.

There are 18 Cardinal Vowels, but we’ll just take a look at a few of them. For example, Cardinal Vowel 1 [i] is made with the tongue as high and forward in the mouth as possible (without creating friction and thus turning into a consonant). CV 8 [u] is made with the tongue as high and back in the mouth as possible. CV 5 [ɑ] is made with the tongue as low and back in the mouth as possible. We also have CV 4 [a] which sits in the remaining corner. The Handbook of the International Phonetic Association (1999) presents the following diagram on page 11:

image © International Phonetic Association

It is described as:

…a mid-sagittal section of the vocal tract with four superimposed outlines of the tongue’s shape. (p.10).

The symbols [i, u, ɑ, a] are placed at the points judged to be the highest point of the tongue. The Handbook states that:

…joining the circles representing the highest point of the tongue in these four extreme vowels gives the boundary of the space within which vowels can be produced. For the purposes of vowel description this space can be stylized as the quadrilateral shown… (p.12)

image © International Phonetic Association

Here we have a diagram to help us find vowels in the mouth. You can see that some other vowel symbols [e, ɛ, o, ɔ] have been placed on the vowel quadrilateral. These are placed at “acoustically equidistant” intervals between the four corner vowels – we’ll come back to this term later. (FYI the vowel quadrilateral is also known as the vowel diagram, vowel trapezium, or vowel chart.)

Of course, vowels can actually be made in any part of the vowel quadrilateral. Rather than have different symbols for every point on the diagram we can place dots where we think the vowel is and then use the nearest Cardinal Vowel symbol to mark it.

The quality of a particular vowel can be indicated by placing a dot on the diagram. The meaning of this dot is something like: the vowel x sounds as if it is produced with the highest part of the tongue in this position. (Ashby & Maidment, 2005, p.76)

image © Journal of the International Phonetic Association

Compared to the previous vowel quadrilateral showing the Cardinal Vowels, you can see that the Italian vowels are made in slightly different positions. We can surmise that Cardinal Vowel 4 [a] in Daniel Jones’s system is made with the tongue as low and front as possible, but in Italian this vowel sound is actually made in a more central position.

The Articulatory Facts

At this point in our discussion, it seems pretty clear that the vowel quadrilateral depicts exact tongue positions. However, the Handbook of the International Phonetic Association says the following:

[vowels] are classified in terms of an abstract ‘vowel space’, which is represented by the four-sided figure known as the ‘Vowel Quadrilateral’…This space bears a relation, though not an exact one, to the position of the tongue in vowel production… (1999, p.10)

the vowel quadrilateral must be regarded as an abstraction and not a direct mapping of tongue position. (p.12)

Given that the Handbook has provided an image of tongue positions for [i, u, ɑ, a], it’s understandable if you’re now a bit confused.

Earlier I said that the Cardinal Vowels were placed at “acoustically equidistant” intervals. Gimson’s Pronunciation of English appears to state that these vowels had tongue positions that were equidistant from an articulatory perspective:

…tongue positions of these qualities were X-rayed and were indeed found to be fairly equidistant from a spatial point of view. (2014, p.36)

But then the phonetician Peter Ladefoged says:

[Daniel] Jones never defined what he meant by saying that the cardinal vowels were acoustically equidistant. He thought that the tongue made equal movements between each of them, even after the publication of x-ray views of the 8 primary cardinal vowels produced by his colleague Stephen Jones showed that this was not the case… Daniel Jones himself published photographs of only four of his own cardinal vowels, although, as he told me in 1955, he had photographs of all 8 vowels. When I asked him why he had not published the other four photographs, he smiled and said “People would have found them too confusing”. (An academic life, p.1)

In fact, the American speech scientist George Oscar Russell published an X-ray study of tongue positions for vowels in 1928. A reviewer of the study in 1929 stated:

…the evidence demolishes the theory that vowel quality is solely or even chiefly dependent upon the position of the surface of the tongue. (Sturtevant, 1929, p.34)

These quotes appear to completely contradict what Gimson’s Pronunciation of English says.

To add to the confusion, it is now evident that you can make the same vowel sound with different tongue positions:

It must be understood that this diagram is a highly conventionalised one which shows, above all, quality relationships. Some attempt is, however, made to relate the shape of the figure to actual tongue positions… Nevertheless it has been shown that it is possible to articulate vowel qualities without the exact tongue and lip positions which this diagram seems to postulate as necessary. (Cruttenden, 2014, pp.37-38)

…x-ray film studies have shown that some speakers use their jaw for changing vowel height, while others barely move their jaw and instead appear to change the shape of their tongue. (Gick et al, 2003, p.155)

Some people vary the height of the tongue in heed, hid, head, had mainly by using the genioglossus muscle, others make more use of the mylohyoid muscle, and yet others control tongue position more by raising and lowering the jaw. You can produce the required tongue shape in several different ways. (Ladefoged & Disner, 2012, p.128)

In a recent ultrasound study on American English vowel sounds, Jonathan Havenhill states the following:

While some speakers distinguish /ɔ/ from /ɑ/ with a combination of tongue position and lip rounding, others do so using either tongue position or lip rounding alone. (2015, p.1)

Wait – so does this mean that the vowel quadrilateral is useless?

…what the cardinal vowel model provides is a mapping system which presents what is essentially auditory and acoustic information in a convenient visual form. (Collins & Mees, 2013, p.67)

The Auditory Vowel Space?

Ok, so now we’ve been told that this vowel quadrilateral represents an auditory space rather than an articulatory one. That is possibly more confusing, but let’s take a look at what some phoneticians say:

These early phoneticians were much like astronomers before Galileo…. [they] were certain they were describing how the stars and planets went around the earth. But they were not. The same is true of the early phoneticians. They thought they were describing the highest point of the tongue, but they were not. They were actually describing formant frequencies. (Ladefoged & Disner, 2012, pp.131-132)

Phoneticians are thinking in terms of acoustic fact, and using physiological fantasy to express the idea. (George Oscar Russell quoted in Ladefoged & Johnson, 2011, p.198)

It used to be thought that the position of the tongue within the mouth, together with lip position, fully determined the quality of a vowel. It is now known that the configuration of the entire vocal tract needs to be taken into account. The vowel quadrilateral has been retained as a useful tool, but it should be thought of as representing an auditory space, rather than an accurate articulatory one. (Ashby & Maidment, 2005, p.77)

Students of phonetics often ask why we use terms like high, low, back, and front if we are simply labeling auditory qualities and not describing tongue positions. The answer is that it is largely a matter of tradition. For many years, phoneticians thought they were describing tongue positions when they used these terms to specify vowel quality. But there is only a rough correspondence between the traditional descriptions in terms of tongue positions and the actual auditory qualities of vowels. (Ladefoged & Johnson, 2011, p89)

I won’t go into detail about acoustic vowel qualities/formants, but here’s something that can help you understand how vowels sound different across the quadrilateral. Do this experiment: whisper [i], then whisper [a], then whisper [ɑ]. Which vowel sounds brightest (or “highest-pitched”) and which darkest (or “lowest-pitched”)?

You may hear that [i] is brightest, [a] is less bright, and [ɑ] is the darkest. You may also perceive brightness as higher-pitched and darkness as lower-pitched (but it’s not actually pitch that you’re hearing because the vocal folds are not vibrating). Looking at the chart you can hear how the top-left of the chart is brightest and the bottom-right is darkest – with variants between. Perhaps in the realms of brightness to darkness there is a way of navigating through the quadrilateral?

Can we Honestly and Effectively Teach Vowels Physically?

By now it should be clear that it is not accurate to give exact tongue positions for vowels. So should we dispense with the vowel quadrilateral once and for all? I say no. As does Gimson’s Pronunciation of English:

…it is convenient to have available a rough scheme of articulatory classification. (2014, p.39)

And the phonetician J. C. Catford says:

The traditional way of classifying vowels works well in practice, and, indeed is the only basis for the successful acquisition of practical skill in producing, identifying, and classifying vowels… [it is] helpful to use a hand mirror…so that one can correlate the visible movements and positions of the tongue and lips, with the proprioceptive sensations, and also with the auditory sensations when they are whispered or voiced. (2001, pp.119-120)

I agree. In fact, I have created my own vowel quadrilateral superimposed on a picture of the mouth. I use this when teaching vowels to English learners with really good results. Even when I tell students that it is an approximation rather than an exact representation of tongue position, they still find it extremely useful to visualise where the vowels are in the mouth – and where one vowel is (physically/auditorily) in relation to others.

Another benefit is that they can concentrate on the tongue rather than the jaw. This is important because if, for example, you teach a student that [a] has an open jaw position, then that student may start to overextend the jaw every time they have to pronounce a word with [a] in. This can lead to jaw problems such as TMD (temporomandibular disorder). For any coaches/teachers reading, please do not teach jaw positions for specific vowels: simply tell students to relax their jaw and focus on the tongue.

As referenced earlier (Havenhill, 2015, p.1), the same vowel can be made with different configurations of the tongue and lips. Some speakers may make a vowel with the tongue more back and the lips relaxed, others with the tongue more forward and the lips more rounded. For this reason, it is not accurate to say that one vowel has an exact lip position, however it is useful to give general guidance for lip position – and adjust a student’s lip position if they are unable to produce a certain vowel sound. English pronunciation textbooks often prescribe strongly rounded lips for the /uː/ vowel sound and strongly spread lips for /iː/. However, I hardly ever see native speakers do this. The lips are far more relaxed in SSBE (Standard Southern British English) than many textbooks state.

In conclusion, I recommend the following:

  1. Do teach vowels physically. It really helps students and gives them another route to learning (as well as listening).
  2. If you use the vowel quadrilateral or a tongue diagram, then do ensure the student knows it is an approximation. Otherwise, they will spend hours attempting to move their tongue into the described position – when we now know that vowels can be made using other positions.

Further Interesting Notes

1. The vowel quadrilateral only describes what the front/back of the tongue is doing. What about vowels that are made with the tongue curled back in a retroflex position (such as some rhotic speakers’ pronunciations of nurse)?

2. Every phonetics book that talks about the vowel quadrilateral talks about the tongue arching towards the vowel dots. However, one book for actors (Speaking with Skill by Dudley Knight) describes the tongue arching upwards for high vowels and cupping downwards for low vowels. Here are a couple of quotes from the book:

The vowel quadrilateral is an attempt to provide a schematic representation of the relative position of the articulators while engaging in the acoustic shaping of the vowels. (2012, p.166)

The total space within the vowel quadrilateral represents – in an imprecise, schematic way – the action of arching or cupping of the front, middle or back of the body (or dorsum) of the tongue. It does not involve the tip or the blade. (p.177)

Catford (2001) seems to disagree with the idea of cupping the tongue:

…note that having the tongue ‘as low as possible’ does not mean that the tongue is hollowed: it must retain essentially the same convexity as it has for all the other front vowels. (p.139)

What we have shown above has proven that exact tongue positions for vowels (and arching/cupping) are not scientifically accurate. However, perhaps the idea of cupping the tongue towards a point in the mouth will help some people create a particular vowel otherwise elusive to them.

3. If you are interested in purchasing an ultrasound machine for speech research/fun, then you can go to the Articulate Instruments website here.

4. If you are interested in seeing the tongue move for particular sounds (via ultrasound and MRI) then have a browse of Seeing Speech’s IPA chart, and the Dynamic Dialects page.

If you enjoyed this post, please share it on social media or tell your friends! If you have comments, please leave them at the bottom of this page. 


Ashby, M & Maidment, J. (2005). Introducing Phonetic Science. Cambridge: Cambridge University Press.

Catford, J.C. (2001, 2nd ed). A Practical Introduction to Phonetics. Oxford: Oxford University Press.

Collins, B & Mees, I. (2013, 3rd ed). Practical Phonetics and Phonology. New York: Routledge.

Cruttenden, A. (2014, 8th ed). Gimson’s Pronunciation of English. Abingdon: Routledge.

Gick, B et al (2003). Articulatory Phonetics. Chichester: Blackwell Publishing.

Handbook of the International Phonetic Association. (1999). Cambridge: Cambridge University Press.

Haverhill J. (2015). An Ultrasound Analysis of Low Back Vowel Fronting in The Northern Cities Vowel Shift. In The Scottish Consortium for ICPhS 2015 (Ed.), Proceedings of the 18th International Congress of Phonetic Sciences. Glasgow, UK: the University of Glasgow.  Paper number 0921.

Knight, D (2012). Speaking with Skill. London: Bloomsbury Publishing Plc.

Ladefoged, P & Johnson, K. (2011, 6th ed). A Course in Phonetics. Canada: Wadsworth Cengage Learning.

Ladefoged, P & Ferrari Disner, S. (2012, 3rd ed). Vowels And Consonants. Chichester: Blackwell Publishing Ltd.

Ladefoged, P.  An academic life. Online.

Rogers, D & d’Arcangeli, L. The sound pattern of Standard Italian, as compared with the varieties spoken in Florence, Milan and Rome. Journal of the International Phonetic Association. 35, 2 (2005): pp.131-151. Online.

Sturtevant, E. H. Reviewed Work: The Vowel: Its Physiological Mechanism as Shown by the X-Ray by G. Oscar Russell. Language. 5, 1 (1929): pp. 33-36. Online.