Edge TTS: The Free Text-to-Speech Engine Nobody Talks About
What You’ll Need
- n8n Cloud or self-hosted n8n instance
- Hetzner VPS or Contabo VPS if self-hosting
- Python 3.8+ installed locally or on your server
- Basic command-line familiarity
- A text file or API endpoint with content to synthesize
Table of Contents
- What Is Edge TTS and Why Should You Care?
- Setting Up Edge TTS on Your Machine
- Integrating Edge TTS with n8n
- Building a Production Workflow
- Getting Started
What Is Edge TTS and Why Should You Care?
I discovered Edge TTS completely by accident while hunting for a free text-to-speech solution that didn’t require expensive API keys or sketchy third-party services. What I found was Microsoft’s hidden gem—a legitimate TTS engine that powers Microsoft Edge’s built-in reader functionality.
Edge TTS is the text-to-speech engine that runs inside Microsoft Edge’s “Read aloud” feature. It’s been available for years, but almost nobody talks about it because Microsoft doesn’t officially document it as a public API. The community reverse-engineered it, wrapped it in Python libraries, and now you can use it for free without authentication, rate limits, or corporate approval.
Here’s what makes it genuinely special:
Natural voice quality. Edge TTS produces audio that sounds like actual humans, not the robotic Stephen-Hawking-era synth voices you’d expect from free tools. I’ve tested dozens of voices across 50+ languages, and the clarity is genuinely impressive.
No API key required. Unlike Google Cloud TTS (pay-as-you-go), AWS Polly (AWS account mandatory), or OpenAI’s TTS, Edge TTS asks for nothing. No authentication. No billing. No rate-limit emails at 3 AM.
200+ voice options. Microsoft has trained models for various languages, accents, and genders. Want a British female voice? Australian male? Korean child voice? They’re all there.
Fast processing. On a decent internet connection, you’ll get audio files in seconds, not minutes.
Perfect for automation. This is where I use it most—building workflows that convert article text to audio, generate YouTube video voiceovers, create multilingual bot responses, or produce accessibility-friendly content automatically.
The catch? It’s technically against Microsoft’s terms of service (they say it’s for Edge’s reader feature only), but the community-maintained Python libraries have been stable for three years without takedown notices. If you’re building commercial products, be aware of that risk.
Setting Up Edge TTS on Your Machine
Let me walk you through the simplest possible setup.
First, install the edge-tts Python package:
pip install edge-tts
That’s it. You’re done with setup. No credentials, no config files, no environment variables to juggle.
Let’s test it immediately with a simple command-line call:
edge-tts --text "Hello, this is Edge TTS speaking" --write-media test_output.mp3 --voice en-US-AriaNeural
This does three things:
- Takes your text string
- Downloads the MP3 audio file
- Saves it as
test_output.mp3
Check your current directory—you’ll see a fresh MP3 file. Play it. The voice quality will shock you if you’ve been using free TTS before.
Now let’s list all available voices:
edge-tts --list-voices
You’ll get output like this:
Name: af-ZA-AdriNeural
Name: af-ZA-WillemNeural
Name: am-ET-AmehaNeural
Name: am-ET-MekdesNeural
Name: ar-AE-FatimaNeural
Name: ar-AE-MohamedNeural
...and 200+ more
Pick a voice that matches your use case. For English, I usually default to en-US-AriaNeural (neutral American female) or en-GB-SoniaNeural (British female), but experiment.
Now let’s write actual Python code to integrate this into your workflows:
import edge_tts
import asyncio
import os
async def text_to_speech(text, output_file, voice="en-US-AriaNeural", rate=0):
communicate = edge_tts.Communicate(text=text, voice=voice, rate=rate)
await communicate.save(output_file)
return output_file
async def main():
text_content = "Welcome to the world of automated audio generation. This is completely free and requires no API keys."
output_path = "welcome_audio.mp3"
voice_choice = "en-US-AriaNeural"
result = await text_to_speech(text_content, output_path, voice_choice)
print(f"Audio saved to {result}")
asyncio.run(main())
Run this script:
python tts_script.py
You’ll have a professional-quality MP3 in seconds.
The rate parameter controls speech speed. Use -50 for slower speech, 0 for normal, 50 for faster. Handy for accessibility or for creating background narration that matches video pacing.
Integrating Edge TTS with n8n
This is where automation gets powerful. If you’re running n8n Cloud or a self-hosted instance on Hetzner VPS , you can trigger TTS generation from any workflow.
Create a new workflow in n8n Cloud and add an “Execute Command” node (or “Run Script” if you’re using the Python node in a self-hosted setup).
Here’s the configuration for a simple HTTP-triggered workflow:
Node 1: HTTP Request trigger
Set up a webhook that accepts POST requests with a JSON body containing your text and voice preference.
Node 2: Function node (execute Python)
import edge_tts
import asyncio
import base64
import json
async def generate_audio(text_input, voice_name):
output_file = f"/tmp/{voice_name}_output.mp3"
communicate = edge_tts.Communicate(text=text_input, voice=voice_name, rate=0)
await communicate.save(output_file)
with open(output_file, "rb") as audio_file:
audio_base64 = base64.b64encode(audio_file.read()).decode("utf-8")
return audio_base64
text = $json.body.text
voice = $json.body.voice or "en-US-AriaNeural"
audio_data = asyncio.run(generate_audio(text, voice))
return {
"audio_base64": audio_data,
"voice_used": voice,
"text_length": len(text),
"status": "success"
}
Node 3: Send the response back
Use an HTTP Response node to return the base64-encoded audio or save it to cloud storage.
When you trigger this workflow via HTTP POST:
curl -X POST http://your-n8n-instance/webhook/tts \
-H "Content-Type: application/json" \
-d '{
"text": "This workflow generated this audio automatically",
"voice": "en-GB-SoniaNeural"
}'
You’ll get back a response with the audio file encoded in base64, ready to play, embed, or store.
💡 Fast-Track Your Project: Don’t want to configure this yourself? I build custom n8n pipelines and bots. Message me with code SYS3-HUGO.
Building a Production Workflow
Let me show you a real-world example: a workflow that converts blog posts into audio summaries and uploads them to cloud storage.
I’ll build this with multiple steps:
Step 1: Fetch blog post content (HTTP Request node)
GET https://api.example.com/posts/latest
Authentication: Bearer YOUR_API_TOKEN
Response format: JSON with "title" and "content" fields
Step 2: Summarize the content (optional—use OpenAI or similar)
If your blog post is 3,000 words, you might want to summarize it to 500 words before generating audio (shorter file, faster generation).
Step 3: Generate audio with Edge TTS (Function node)
import edge_tts
import asyncio
import os
from datetime import datetime
async def create_podcast_audio(post_title, post_summary, voice_choice):
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"/tmp/podcast_{timestamp}.mp3"
full_text = f"{post_title}. {post_summary}"
communicate = edge_tts.Communicate(text=full_text, voice=voice_choice, rate=-10)
await communicate.save(filename)
file_size = os.path.getsize(filename)
return {
"file_path": filename,
"file_size": file_size,
"timestamp": timestamp,
"duration_estimate": round(len(full_text) / 150, 1)
}
title = $json.payload.title
summary = $json.payload.content
voice = "en-US-AriaNeural"
result = asyncio.run(create_podcast_audio(title, summary, voice))
return result
Step 4: Upload to AWS S3 (or any cloud storage)
Use n8n’s built-in S3 node:
{
"bucket": "your-bucket-name",
"key": "podcasts/{{ $node['Function'].data.timestamp }}.mp3",
"fileContent": "{{ $node['Function'].data.file_path }}",
"acl": "public-read"
}
Step 5: Send notification (Slack, Email, or Webhook)
{
"text": "New podcast audio generated",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Blog Audio Generated*\n*Title:* {{ $node['HTTP Request'].data.title }}\n*File Size:* {{ $node['Function'].data.file_size }} bytes\n*Listen:* https://your-bucket.s3.amazonaws.com/podcasts/{{ $node['Function'].data.timestamp }}.mp3"
}
}
]
}
This workflow runs on a schedule or webhook trigger, converts your latest blog post to audio, uploads it, and notifies your team—all without a single manual step.
If you want to go further, combine this with the techniques in How to Monitor Your VPS with n8n Health Check Workflows to ensure your audio generation service stays healthy, or use How to Build a Telegram Bot with n8n (No Code Required) to let users request audio generation on-demand.
Advanced: Handling Large-Scale Audio Generation
If you’re generating hundreds of audio files, edge-tts can hit rate limits (Microsoft’s servers will temporarily block you). Here’s a production-ready approach:
import edge_tts
import asyncio
import time
from typing import List
async def generate_audio_batch(text_list: List[str], voice: str, delay_seconds: int = 2):
results = []
for idx, text in enumerate(text_list):
try:
output_file = f"/tmp/batch_{idx:04d}.mp3"
communicate = edge_tts.Communicate(text=text, voice=voice, rate=0)
await communicate.save(output_file)
Want to automate this yourself?
Start with n8n Cloud (free tier available) or self-host on a Hetzner VPS for full control.
📬 Get Weekly Automation Tips
One email per week with tutorials, tools, and workflows. No spam, unsubscribe anytime.