🐟 Fish Audio S2 Pro
State-of-the-Art Dual-Autoregressive Text-to-Speech · Model Page ↗ · GitHub ↗
80+ languages supported · Zero-shot voice cloning · 15,000+ inline emotion tags
✍️ Input Text
🎧 Result
🏷️ Supported Emotion Tags
15,000+ unique tags supported. Use free-form descriptions like
[whisper in small voice] or [professional broadcast tone].
Common tags:
[pause] [emphasis] [laughing] [inhale] [chuckle] [tsk] [singing] [excited] [laughing tone] [interrupting] [chuckling] [excited tone] [volume up] [echo] [angry] [low volume] [sigh] [low voice] [whisper] [screaming] [shouting] [loud] [surprised] [short pause] [exhale] [delight] [panting] [audience laughter] [with strong accent] [volume down] [clearing throat] [sad] [moaning] [shocked]🌍 Supported Languages
Tier 1: Japanese · English · Chinese |
Tier 2: Korean · Spanish · Portuguese · Arabic · Russian · French · German
Also supported: sv, it, tr, no, nl, cy, eu, ca, da, gl, ta, hu, fi, pl, et, hi,
la, ur, th, vi, jw, bn, yo, sl, cs, sw, nn, he, ms, uk, id, kk, bg, lv, my, tl, sk, ne, fa,
af, el, bo, hr, ro, sn, mi, yi, am, be, km, is, az, sd, br, sq, ps, mn, ht, ml, sr, sa, te,
ka, bs, pa, lt, kn, si, hy, mr, as, gu, fo, and more.
Language is auto-detected from the input text — no configuration needed.