Place your tongue tip near the roof of your mouth behind your top teeth. Push air through the narrow gap. No voicing.

Americans pronounce cities as SIH-teez (/ˈsɪɾiz/). In "cities", the "t" between vowels sounds like a quick "d" — the tongue briefly taps the ridge behind the upper teeth. This is called the Flap T, the kind of sound shift that makes everyday speech feel effortless. It comes out as SIH·teez. Stress falls on the first syllable — keep everything else short and quick. You'll hear it in sentences like "I enjoy urban sketching when I visit new cities" or "Autonomous vehicles are being tested in several major cities" — more examples below.
Record yourself saying "cities" and play it back. The mic stays on your device — nothing's uploaded.
2 syllables, 5 sounds. Tap a syllable to jump to its row, then explore each sound's mouth shape and how it's made.
Quickly bounce the front of your tongue against the roof of your mouth. Don't stop the airflow — just a quick tap.

Pull the corners of your lips back slightly. Arch the middle-front of your tongue high toward the roof of the mouth.

Same position as S, but add vocal cord vibration. Feel the buzz.

Click any sentence to see the full breakdown — every link, every reduction, every flap-T.
The textbook way isn't wrong — it's just not how anyone actually says it.
In "cities", the "t" between vowels sounds like a quick "d" — the tongue briefly taps the ridge behind the upper teeth. /t/ or /d/ becomes a quick tap [ɾ] — sounds like a soft D. The tongue briefly taps the ridge behind the upper teeth.
Stress falls on the first syllable, not the others. Stretch SIH — keep everything else short and quick.