Most "text-to-speech" reviews are written by people who only ever feed the tool English. Chinese is a different problem entirely. A Mandarin voice has to get tones right, pick the correct reading of polyphonic characters (多音字), break a wall of characters into the right words without spaces to guide it, and — the part almost everyone gets wrong — switch cleanly into English mid-sentence when you write "用 ChatGPT 写了个 prompt." I read a lot in Chinese: 微信读书 on the train, 知乎 long-answers over lunch, the occasional arXiv paper that's half English. So I spent a couple of weeks running my actual reading through the tools that claim to do Chinese, and below is what holds up — with honest notes on where each one falls apart.
What makes Chinese TTS hard (and what to listen for)
Before the tools, here's what separates a usable Mandarin voice from a robotic one, so you know what you're judging:
- Tones and prosody. Bad engines read every sentence at the same flat pitch, which in Mandarin is exhausting because tone is meaning. A good voice has natural sentence melody — it sounds like a person reading, not a subway announcement.
- Polyphonic characters (多音字). 行 is xíng or háng; 重 is zhòng or chóng; 还 is hái or huán. Weak engines guess from frequency and get the rarer reading wrong constantly. The better neural voices use context and are right far more often.
- Word segmentation. Chinese has no spaces, so the engine has to decide where words break. Get it wrong and you get unnatural pauses or run-on phrasing.
- Code-switching (中英混读). This is the real torture test in 2026. Real Chinese writing is full of English — brand names, tech terms, 你的 KPI、跑个 demo. Most tools either read English letters one-by-one in a Chinese accent ("C-H-A-T") or stutter at every switch. The few that handle it gracefully are the ones worth keeping.
When you trial any tool, paste in one mixed paragraph with a 多音字 or two and just listen. You'll know in fifteen seconds.
CastReader — read Chinese pages aloud in place (and it's free)
Full disclosure: we build CastReader, so weigh this how you like. But it exists to solve the exact thing that annoyed me about every other option — reach. Most Chinese TTS tools are a box you paste text into. CastReader is a free to use reader (Chrome/Edge extension plus native Mac and iOS/Android apps) that reads what's already on your screen, in place, without copy-paste. No signup, no trial that expires — natural neural Mandarin voices, free. CastReader Pro adds premium ultra-realistic voices, more listening hours, and AI document analysis.
The reason I reach for it on Chinese content is that it reads the apps I actually read in: it'll read 微信读书 / WeRead aloud right in the browser reader, and it handles 知乎 / Zhihu long answers — the kind of 3,000-字 answer you don't want to scroll through. Because it reads any web page, the China-native sites that no Western tool bothers with — 豆瓣, 简书, 公众号 articles opened in a browser — all just work. It also reads long AI threads from Claude, ChatGPT, and Gemini end to end, which matters because a lot of Chinese-language AI chats are exactly the half-Chinese-half-English code-switch that breaks weaker engines. And it turns a PDF into an audiobook or an EPUB into audio for offline listening.
On the two hard tests above, it does well: code-switching is smooth (it doesn't spell out English acronyms letter-by-letter), and 多音字 accuracy is solid in context. Adjust speed to taste — Mandarin at 1.4x is my walking pace.
Don't use it when: you need to produce and export an MP3 of branded narration for a published video — that's a creator-tool job, not a reading job. For actually getting through your Chinese reading list, it's the one I keep open. Install from the Chrome Web Store, App Store, Google Play, or run the Mac app.
Microsoft Edge "Read Aloud" — the best free Chinese voices already on your PC
This is the one most people overlook, and it's genuinely good. Edge's built-in Read Aloud (Ctrl+Shift+U) ships with Microsoft's neural Chinese voices — Xiaoxiao (晓晓) and Yunxi (云希) are the standouts — and they're among the most natural Mandarin voices you can use for free, with no install and word-by-word highlighting. For "read this article while I cook," it's hard to beat.
Edge also exposes a healthy roster of regional voices, so if you want 普通话 vs. 粤语 (Cantonese) vs. 台湾国语, you can pick. Tone and prosody on Xiaoxiao are legitimately pleasant.
Where it stops: it only reads ordinary web pages. It won't reliably touch a PDF, a logged-in reader frame like WeRead, or dynamic app content, and it lives only inside Edge. As a free baseline for plain articles, though, it's excellent — and the underlying Azure voices are the same ones many paid tools quietly resell.
The browser/OS built-ins (free, already on your device)
Don't sleep on what you already own:
- macOS speaks selected Chinese text via System Settings → Accessibility → Spoken Content. The bundled voices Tingting (婷婷) and Sinji (粤语) are decent, fully offline, and free forever. Great for a paragraph, clunky for a 40-page report — it's selection-based with no real queue.
- Chrome can read pages via extensions that tap the Web Speech API, but the quality depends entirely on which Chinese voice your OS provides, so results vary a lot machine to machine.
- iOS / Android both have system "Speak Screen" / "Select to Speak" with Mandarin voices — fine in a pinch, but no cross-app handoff and limited control.
These are perfect for a quick sentence and tiring for long-form. The moment you're reading a whole book or a logged-in app, you'll want a dedicated reader.
The "Chinese voice generator" tools (and when they're overkill)
There's a whole category aimed at creators who need to export polished Mandarin audio — think 配音 for short videos. The big names with genuinely strong Chinese:
- Microsoft Azure TTS — the engine behind Edge's voices. Pay-as-you-go (the standard neural tier is roughly $15 per 1M characters, with a free monthly allotment). The best value if you're technical and want raw API access to those Xiaoxiao/Yunxi voices.
- ElevenLabs — excellent multilingual voices including Chinese, with emotion and cloning. Free tier is capped (about 10k credits/month); paid starts around $5/month (Starter) and $22/month (Creator). Strongest for expressive, character-style narration.
- Speechify and NaturalReader both support Chinese, but their free tiers are demos — Speechify rations its good voices, and NaturalReader caps premium neural voices at roughly 20 minutes a day (paid plans from about $20.90/month). See our Speechify alternative and NaturalReader alternative breakdowns for the side-by-side.
These are the right tools when you're making audio to publish. They're overkill when you just want to listen to something — you'll spend more time pasting text and managing exports than you save. For a deeper look at what these engines can do, our AI voice generator guide goes further.
How to actually use Chinese TTS in your day
A few habits that made Chinese listening genuinely stick for me, rather than a novelty I tried once:
- Pick the voice once, then forget it. For neutral 普通话, Xiaoxiao (Edge/Azure) or CastReader's default neural voice are both safe. Don't tool-hop — consistency is what makes long listening comfortable.
- Start slower than you think. Mandarin packs a lot of information per syllable; I read English at 1.8x but Chinese at 1.3–1.4x. Creep up over a week.
- Feed it the right content type. For 微信读书 and 知乎, read in place — don't copy-paste a 3,000-字 answer. For a paper or report, convert the PDF to an audiobook so you can pause and resume.
- Use it for the hard reading. I keep text-to-speech for students close because listening while following along is genuinely good for retention on dense material — classical Chinese, legal text, anything you'd otherwise re-read three times.
- Test code-switching with YOUR text. If you write a lot of half-English Chinese, paste one of your real paragraphs into any tool before committing. The demo paragraphs are always cherry-picked.
For anything that isn't a plain article — a logged-in app, a PDF, an AI chat — a dedicated free text-to-speech reader will save you the constant copy-paste that kills the habit.
FAQ
What's the most natural free Chinese TTS voice?
Microsoft's neural voices — Xiaoxiao (晓晓) and Yunxi (云希) — are the most natural you can use for free, available right inside Edge's Read Aloud and via Azure's free tier. CastReader uses neural Mandarin voices too and is free to use with no signup, with the advantage that it reads pages in place instead of making you paste text.
Can text-to-speech read mixed Chinese and English?
Some can, most can't do it gracefully. Weaker engines spell English acronyms letter-by-letter or stutter at every switch. Always test with one of your own mixed paragraphs (something with 你的 KPI、跑个 demo in it) — fifteen seconds of listening tells you everything.
Can I listen to WeRead or Zhihu aloud?
Yes, with a reader that works on the page itself rather than a paste box. CastReader reads WeRead and Zhihu directly in the browser, including long-form answers, without copy-pasting.
Does Chinese TTS handle polyphonic characters (多音字) correctly?
Modern neural engines use context and get characters like 行, 重, and 还 right far more often than older rule-based ones, but none are perfect — a rare name or an unusual phrase will still trip them up occasionally. The better the voice, the rarer the mistakes.
Is there a free option with no trial?
Yes. Edge Read Aloud and your OS's built-in voices are free forever for plain text. CastReader is free to use across Chrome/Edge, Mac, iOS, and Android — no signup, no trial that expires — and reads Chinese apps and pages in place; CastReader Pro is an optional upgrade for premium ultra-realistic voices, more listening hours, and AI document analysis. Questions: support@castreader.ai.