English has five vowel letters. Six if you let y in on the right days. Say cat, father, saw, day, care, about out loud and listen to the first vowel in each: that single letter a just did six completely different things to your mouth. The letters are a rough code. The sounds underneath them are something else, and there are far more of them than there are letters to write them with.
A General American mouth makes around twenty distinct vowel sounds. The exact number depends on how you count, and people who study this for a living argue about it. The deeper problem is that the spelling almost never tells you which sound you’re aiming for. Through, though, thought, tough, thorough: five spellings built from the same letters, five different vowels. This is the page that lays all of them out by the only thing that’s reliable, which is what your mouth is actually doing.
English spells its vowels with five letters, but a General American mouth produces around twenty distinct vowel sounds. They fall into three families: the simple vowels (single steady sounds like the ones in see, cat, moon), the diphthongs (gliding sounds that move while you say them, like day and go), and the r-colored vowels (a vowel fused with the American R, as in car and bird). The number is fuzzy on purpose: this chart lays out twenty-two, but many Americans merge the vowels in cot and caught, and the schwa /ə/ and the vowel in fun are often counted as one. The single most useful habit this chart can give you: stop trusting the spelling and start learning each vowel by its sound and the shape your mouth makes for it.
Five letters, far more sounds
A vowel is the open sound at the core of a syllable. When you make one, air flows out of your mouth without being blocked or pinched anywhere; your tongue and lips just shape the empty space, and the size and position of that space are what separate one vowel from another. Consonants are the opposite: they’re made by closing or constricting the airway (the lips meeting for /b/, the tongue stopping the air for /t/). Every syllable is built around a vowel, and the vowel carries the stress and most of the volume. It’s the part of the syllable that sings.
Because the space inside your mouth can be adjusted by tiny amounts, the number of vowels a language can distinguish is large and the boundaries between them are fine. Spanish settles on five. Japanese on five. English, for historical reasons that have nothing to do with logic, settles on roughly four times that many, and then writes them all with the same five letters it inherited from Latin. That mismatch is the whole difficulty. A learner who pronounces English vowels the way the letters suggest will produce the wrong sound most of the time, because the letters were never a reliable map of the sounds.
So this chart is organized by sound, not by letter. Each vowel gets an anchor word (a common word whose vowel is unmistakable) and the IPA symbol that phoneticians use to write it. Learn the anchor word and you have a permanent handle on the sound, regardless of how any particular word chooses to spell it.
How vowels are described
Every vowel is defined by three things your mouth is doing, and once you can feel them you can place any vowel without memorizing it.
The first is how open your jaw is, which phoneticians call height. Say see and then saw. For see your jaw is nearly closed and the vowel feels high in your mouth; for saw your jaw drops and the vowel feels low. The second is where the body of your tongue bunches, called frontness. Say see and then moon: same nearly-closed jaw, but for see the tongue is pushed forward and for moon it’s pulled back, with a central position in between. The third is what your lips do. For moon they round into a small circle; for see they spread. In English the rounded vowels are almost all back vowels, so the two usually go together.
Plot every vowel on those axes (high to low, front to back) and you get the shape phoneticians draw as a lopsided four-sided figure, the vowel quadrilateral, with see in the top-left corner, moon in the top-right, the cat vowel at the bottom-left, and the open father vowel toward the bottom-right. You don’t need the diagram to use this article, but the intuition behind it is worth keeping: vowels live in a continuous space, and the named vowels below are the points in that space that English has decided to treat as separate.
Two more distinctions matter for American English specifically. One is tenseness. Some vowels are made at the edges of the vowel space and held a touch longer (the vowel in sheep); their shorter, laxer partners sit a little inside them (the vowel in ship). These tense–lax pairs are the single biggest source of vowel trouble for learners, because many languages have only one vowel where English has two. The other is whether the vowel glides. A simple vowel holds one steady position. A diphthong starts at one position and slides to another while you’re still saying it, which is why day and go feel like they’re moving. Gliding separates the first two families below: simple vowels hold still while diphthongs move, and tenseness is the fault line running through the simple vowels themselves.
The simple vowels
These are the steady, single-position vowels, what phoneticians call monophthongs. American English has nine of them. Read each anchor word out loud and let the vowel sit still; none of these should move while you say it.
| Sound | IPA | Anchor word | Respelling | Where it sits |
|---|---|---|---|---|
| SEE vowel | /i/ | see, beat, sea | ee | High front, tense, lips spread |
| SIT vowel | /ɪ/ | sit, ship, bit | ih | High front, lax (the relaxed see) |
| BED vowel | /ɛ/ | bed, met, bet | eh | Mid front |
| CAT vowel | /æ/ | cat, bat, sad | a | Low front, the famously American vowel |
| FATHER vowel | /ɑ/ | father, hot, cot | ah | Low back, jaw wide open |
| SAW vowel | /ɔ/ | saw, caught, sought | aw | Low-mid back, lips slightly rounded |
| BOOK vowel | /ʊ/ | book, full, look | uu | High back, lax, lightly rounded |
| MOON vowel | /u/ | moon, fool, who | oo | High back, tense, lips rounded |
| FUN vowel | /ʌ/ | fun, luck, cut | uh | Mid-central; its unstressed twin is the schwa |
Three of those rows hide the contrasts that trip up the most learners: two tense–lax pairs and one vowel a lot of languages don’t have.
The first pair is /i/ versus /ɪ/: sheep and ship, beat and bit. The see vowel is tense, with the tongue high and to the front and the sound held a fraction longer; the sit vowel is its relaxed version, a little lower and shorter and softer. Speakers whose first language has only one high front vowel tend to merge these, and ship drifts toward sheep. The second pair is /u/ versus /ʊ/: fool and full, pool and pull. Same tense-versus-relaxed relationship at the back of the mouth. The third is the cat vowel /æ/, which a lot of languages simply don’t have. It sits lower than the bed vowel and much further forward than the father vowel, and it’s the sound that keeps cat from collapsing into cot (the father vowel) or ket (the bed vowel). If your cat keeps turning into one of its neighbors, you’re not alone, and it’s the highest-value single vowel to drill.
A caveat that belongs right here, because it’s the reason the “how many vowels” question has no clean answer. The father vowel /ɑ/ and the saw vowel /ɔ/ are merging across much of the United States. For a large share of American speakers, especially in the West and much of the interior, cot and caught are pronounced identically, and father and saw share one back vowel. For other speakers, mostly in the Northeast and the South, they stay distinct. Both are standard American English. If you can’t hear a difference between cot and caught from a given speaker, that speaker has the merger, and you can safely use one vowel for both.
The diphthongs — vowels that move
A diphthong is a vowel that doesn’t hold still. It starts in one position and glides to another inside a single syllable, so your mouth is in motion the whole time you’re saying it. Say day slowly and feel your jaw close slightly and your tongue rise toward the see position by the end. That movement is what makes it a diphthong; said as a flat, steady vowel, it sounds foreign even when the starting point is right.
| Sound | IPA | Anchor word | Respelling | The glide |
|---|---|---|---|---|
| DAY diphthong | /eɪ/ | day, way | ay | starts just above bed, glides up toward ee |
| MY diphthong | /aɪ/ | my, why | ahy | starts low, glides up toward ee |
| BOY diphthong | /ɔɪ/ | boy, toy | oy | starts rounded and back, glides up toward ee |
| NOW diphthong | /aʊ/ | now, how | ow | starts low, glides back toward oo |
| GO diphthong | /oʊ/ | go, row | oh | starts mid-back, glides up toward oo |
The thing to notice is that all five glide toward one of the two high corners of the mouth, ee or oo. That’s what a diphthong is doing: traveling from an open position toward a closed one. The most common learner mistake is to clip the glide and land on a single steady vowel, so day and go come out flat. That pure vowel is perfectly normal in languages like Spanish and French, whose mid vowels don’t glide, but an American ear is listening for the movement, and without it the word sounds off. The fix is to deliberately exaggerate the movement at first; the glide will shrink to a natural size on its own.
One more belongs in this family with an asterisk. The sound in cute, few, and use (the CUTE sound, /ju/) is counted among the diphthongs in many learner charts, including SayWaader’s. Strictly it’s a y-glide (the consonant /j/) followed by the moon vowel, rather than a single gliding vowel, but it behaves like a unit and it’s useful to learn alongside the others.
The r-colored vowels
This is where American English parts ways with the rest. In a rhotic accent like General American, when a vowel and an R sit in the same syllable, the R doesn’t wait its turn as a separate sound. It fuses with the vowel, bending the whole vowel around the tongue position for the American R while you’re still in the middle of it. The result is a small set of vowels that carry their R inside them. British Received Pronunciation drops these R’s entirely; the American R kept inside the vowel is one of the strongest single markers of the accent.
| Sound | IPA | Anchor word | Respelling | Notes |
|---|---|---|---|---|
| CAR R-vowel | /ɑr/ | car, star, heart | ar | the father vowel with R |
| MORE R-vowel | /ɔr/ | more, four, door | or | the saw vowel with R |
| BIRD R-vowel | /ɜr/ | bird, word, first | ur | stressed; the pure R-as-a-vowel |
| MOTHER R-vowel | /ər/ | mother, better | er | unstressed; the r-colored schwa |
| HAIR R-vowel | /ɛr/ | hair, care, fair | air | the bed region with R |
| NEAR R-vowel | /ɪr/ | near, here, beer | eer | the sit region with R |
| TOUR R-vowel | /ʊr/ | tour, cure, jury | uur | the book region with R; rarest, and poor / sure have mostly merged into MORE |
The two that do the most work are the bird vowel and the mother vowel, which are the same r-colored sound under stress and without it (phoneticians often compress each into a single symbol, /ɝ/ for the stressed bird vowel and /ɚ/ for the unstressed mother one). Bird, word, first carry it stressed; the endless -er endings of mother, better, water, teacher carry it unstressed, where it’s just the schwa with R added on top. Both are pure American R doing the job of a vowel, and getting the tongue position right is the same problem as the consonant R. The full mechanics of that tongue position live in the article on the American R, and the unstressed -er ending is the r-colored corner of the schwa.
The vowel that swallows the others
There’s one more vowel, and in raw frequency it beats every vowel above: the schwa, /ə/, the small neutral “uh” in the first syllable of about and the last of sofa. It’s the sound a syllable falls to when it loses its stress, the thing every other vowel collapses into. Photograph keeps a full cat vowel in its last syllable (FOH-tuh-graf); in photography that same syllable goes unstressed and the vowel dissolves to a schwa (fuh-TAH-gruh-fee). The vowel didn’t change because the letters changed. It changed because the stress moved.
This is why the chart above can feel like it’s describing a language you don’t quite hear in conversation. In running American speech, only the stressed syllables get to keep the full vowel from the chart. Everything unstressed reduces toward the schwa, which is how an American sentence ends up with a handful of clear vowels carrying the beat and a crowd of “uh”s filling the gaps between them. The schwa has its own full treatment in the schwa article; for the purposes of this chart, just know that the vowel you read off the page is the vowel a syllable has when it’s stressed, and that most syllables in real speech aren’t.
Why you can’t trust the spelling
The reason English vowels need a chart at all, when Spanish or Italian vowels barely do, is that English spelling stopped tracking English pronunciation centuries ago and never caught up. Two patterns cause most of the confusion.
One spelling, many sounds. The letter a is the cat vowel in cat, the father vowel in spa, the saw vowel in all, the day diphthong in table, the hair vowel in care, and a schwa in about. The letter cluster ou is one vowel in soup, another in out, another in though, another in touch, and yet another in could. You cannot read the vowel off the letters with any confidence.
One sound, many spellings. Run it the other way and it’s just as loose. The see vowel /i/ is spelled six different ways in see, sea, field, machine, key, and people. The day diphthong shows up as day, rain, eight, they, and break. Same vowel every time, five or six costumes.
So when you meet a new English word, don’t reason from its spelling to its sound. Check the vowel (a dictionary’s IPA, the respelling in an app, or your ear on a native recording) and attach the sound to the word directly. The chart above gives you the small set of targets; the spelling is just the unreliable label on the outside.
Practice phrases
Read each line out loud, twice, slowly. These sentences are loaded with the contrasts that matter most: the tense–lax pairs, the cat vowel, the diphthong glides, and the r-colored set. The tricky vowels are marked in respelling.
- Did you see the ship leave? Did you SEE the SHIHP leave?
- The pool is full by noon. The POOL is FUUL by NOON.
- I can't catch the last cab. I KANT KACH the LAST KAB.
- She bought a small ball. She BAWT a SMAWL BAWL.
- Look at the full moon tonight. LUUK at the FUUL MOON tonight.
- My boy found a toy downtown. MAHY BOY found a TOY downtown.
- The bird heard the word first. The BURD HURD the WURD FURST.
- Her father parked the car. Her FAH-ther parked the KAR.
- Go slow on the open road. GOH SLOH on the open ROHD.
- Take the same way home today. TAYK the SAYM WAY home toDAY.
If a line feels like work, slow it down until each vowel is fully formed, then bring it back up to speed. The goal is to be able to produce the contrast on demand, so your ship never lands on sheep when it counts.
How different first languages handle this
Your starting point depends mostly on how many vowels your first language draws lines between. A language with a five-vowel system has to map English’s roughly twenty sounds onto five buckets, so several English vowels collapse together; a language with a rich vowel inventory has fewer of these collisions but its own specific gaps.
| Your L1 | Vowel inventory vs English | What to focus on |
|---|---|---|
| Spanish | ✗ Five vowels a e i o u, each one steady and pure; no tense–lax pairs, and the mid vowels e and o stay pure where English glides them (day, go) | The highest-collision starting point. Drill the tense–lax pairs first (sheep/ship, fool/full), then the cat vowel, then keeping the diphthong glides from flattening. |
| Italian | ✗ Seven vowels has the bed-vs-day and saw-vs-go openness contrasts, but no tense–lax quality contrast and no cat vowel | Similar to Spanish. The cat vowel and the sit/book lax vowels are the new targets; the glides need protecting too. |
| Japanese | ✗ Five vowels a i u e o with a separate long/short length contrast that doesn’t line up with English tense–lax | The lax vowels (sit, book) and the cat vowel are the gaps. Resist mapping English length onto Japanese length; the difference is mostly tongue position, not duration. |
| Mandarin Chinese | ~ Mid-size, very different shape a handful of vowel phonemes with many context-dependent variants; the 儿化 (erhua) r-ending gives some familiarity with r-colored finals | The front pile-up is the work: see/sit, bed/cat. The r-colored vowels usually come a little easier here thanks to erhua, though the Mandarin rhotic isn’t identical to the American one. |
| Korean | ~ Seven or eight vowels a fairly rich system, but the see/sit region and the bed/cat region each tend to collapse toward one sound | Focus on splitting the two front pairs and on the cat vowel. Diphthongs are mostly fine. |
| Hindi | ~ Rich, quality-based pairs ten or eleven vowels whose short/long pairs differ in quality much like English tense–lax; but cat often lands on bed or father, and the saw/go pair blurs | The cat vowel and the saw-vs-go contrast are the main targets. The native quality-based pairs are an asset for hearing English tense–lax. |
| Arabic | ✗ Three qualities a i u, each short and long, so the whole crowded English front region (see/sit/bed/cat) maps onto roughly one or two buckets | The front vowels need pulling apart one pair at a time. Expect sit, bed, and cat to start out sounding alike, and separate them deliberately. |
| French | ~ Rich, but monophthongal many vowels, including front rounded ones English lacks, but French vowels are pure, with no gliding diphthongs | The inventory helps with the simple vowels; the work is the glides. Day and go must move, not stay flat. Watch the sit and book lax vowels too. |
| German | ✓ Rich, with tense–lax has tense–lax pairs (bieten/bitten) and several English diphthongs already, the biggest head start of any major L1 | Mostly fine-tuning. The cat vowel is the notable gap, and the r-colored vowels need a fully pronounced American R rather than the dropped, vowel-like syllable-final R of German (the -er of Vater is a vowel, not a rhotic). |
| Portuguese (Brazilian) | ~ Seven oral vowels plus nasals open/close mid contrasts help, but no English-style tense–lax pairs, and word-final vowels tend to raise | The lax vowels (sit, book) and the cat vowel are the targets; keep final vowels from drifting up toward ee and oo. |
| Russian | ~ Five or six vowels with heavy unstressed reduction; the see/sit and cat distinctions aren’t native | Split see from sit and build the cat vowel. The reduction habit actually helps with English schwa, even though the full vowels need work. |
The pattern across the table is that the front of the mouth is where the trouble concentrates for almost everyone. The see/sit pair, the bed/cat pair, and the cat vowel on its own are the contrasts most languages don’t draw, so they’re where the time pays off fastest. None of this is a flaw in your ear. These are just distinctions English draws that your first language doesn’t, and you can learn to hear and make them.
FAQ
Around twenty, but the exact count is genuinely debated and depends on how you count. Many references land around fifteen if you count the simple vowels and diphthongs and treat the r-colored vowels as vowel-plus-R rather than separate sounds. SayWaader’s sound library counts twenty-two: nine simple vowels, six diphthongs, and seven r-colored vowels. Two specific judgment calls move the number: many Americans merge the vowels in cot and caught (which removes one), and the schwa /ə/ and the vowel in fun are often treated as a single sound that changes with stress.
A monophthong is a vowel that holds one steady position the whole time you say it, like the vowel in see or cat. A diphthong is a vowel that glides from one position to another within a single syllable, like the vowel in day (which moves up toward an ee) or now (which moves back toward an oo). American English has nine monophthongs and five core diphthongs (six if you also count the cute sound, /ju/). The most common mistake learners make with diphthongs is clipping the glide so the vowel comes out flat.
Because of the cot–caught merger, a sound change in which the father vowel /ɑ/ and the saw vowel /ɔ/ have collapsed into one back vowel. It’s widespread across the western and interior United States, where cot and caught, don and dawn, are pronounced identically. Speakers in parts of the Northeast and the South still keep them distinct. Both are standard General American, so if a particular speaker merges them, you can safely use a single vowel for both.
For most learners it’s the cat vowel /æ/, the sound in cat, bad, and map. Many languages don’t have a vowel in that exact spot, so it gets pulled toward the bed vowel (cat sounds like ket) or the father vowel (cat sounds like cot). The tense–lax front pairs, see-vs-sit and the back moon-vs-book, are close behind, because they ask you to split one vowel into two.
Both descriptions are used. In a rhotic accent like General American the R fuses so completely with the preceding vowel that phoneticians often write the result as a single r-colored vowel (the /ɝ/ in bird, the /ɚ/ in mother). Other analyses treat it as an ordinary vowel followed by the consonant R. For a learner the distinction doesn’t matter much; what matters is that the R changes the vowel and, in American English, is always pronounced. See the article on the American R for the tongue mechanics.
It helps, but it isn’t required. The anchor words do most of the work: if you tie each vowel to a word you can already say (see, cat, moon), you can look up any new word’s vowel by matching it to one of them. The IPA symbol is useful mainly for reading dictionary entries, where /ɪ/ versus /i/ tells you instantly whether a word takes the sit vowel or the see vowel.
The five vowel letters are a historical accident, and they will keep lying to you for as long as you read English through them. You don’t have to memorize twenty new symbols; just anchor each vowel to a word you already own. Pick the two or three contrasts your first language doesn’t draw, drill them until the difference is automatic, and let the rest of the chart be a reference you return to when a word surprises you.