Relax your lips and drop your jaw significantly. The tongue tip lightly touches behind the bottom front teeth and the back part of the tongue presses down a little to create more dark space in the back of the mouth.

Americans pronounce auditory as AH-duh-tor-ee (/ˈɑdəˌɾɔri/). In "auditory", the "t" between vowels sounds like a quick "d" — the tongue briefly taps the ridge behind the upper teeth. This is called the Flap T, the kind of sound shift that makes everyday speech feel effortless. So instead of AH·tuh·tor·ee, you get AH·duh·TOR·ee. Stress falls on the first syllable — keep everything else short and quick. You'll hear it in sentences like "I discovered that I am an auditory learner through experience".
Record yourself saying "auditory" and play it back. The mic stays on your device — nothing's uploaded.
4 syllables, 6 sounds. Tap a syllable to jump to its row, then explore each sound's mouth shape and how it's made.
The textbook way isn't wrong — it's just not how anyone actually says it.
In "auditory", the "t" between vowels sounds like a quick "d" — the tongue briefly taps the ridge behind the upper teeth. /t/ or /d/ becomes a quick tap [ɾ] — sounds like a soft D. The tongue briefly taps the ridge behind the upper teeth.
Stress falls on the first syllable, not the others. Stretch AH — keep everything else short and quick.
Don't pronounce the first syllable too fully. The unstressed syllable reduces to a schwa — the lazy "uh" sound — in casual speech.
Americans use a relaxed retroflex R — the tongue curls back rather than rolling. The R is one continuous sound with the vowel before it, not two separate sounds.