Touch the front of your tongue to the roof of your mouth, then release into a 'zh' position. Add vocal cord vibration.

Americans pronounce gestures as JEHS-cherz (/ˈʤɛsʧərz/). Stress falls on the first syllable — keep everything else short and quick.
Record yourself saying "gestures" and play it back. The mic stays on your device — nothing's uploaded.
2 syllables, 6 sounds. Tap a syllable to jump to its row, then explore each sound's mouth shape and how it's made.
Touch the front of your tongue to the roof of your mouth, then release into a 'zh' position. Add vocal cord vibration.

Drop your jaw moderately. Touch the tongue tip behind the bottom front teeth and lift the mid-front part slightly toward the roof.

Place your tongue tip near the roof of your mouth behind your top teeth. Push air through the narrow gap. No voicing.

The textbook way isn't wrong — it's just not how anyone actually says it.
Stress falls on the first syllable, not the others. Stretch JEHS — keep everything else short and quick.
Americans use a relaxed retroflex R — the tongue curls back rather than rolling. The R is one continuous sound with the vowel before it, not two separate sounds.