Monday, June 21, 2004 - Posts

Text-To-Speech (to MP3) for Technical Content

I love listening to .NET Rocks while driving in my car.  The problem is, I finish each episode in a single day. 

About a year ago, I wanted to see if my other commuting and lawn mowing hours could be spent tackling my "to read" list (see also my "Stack" post.)  I experimented with available Text-To-Speech (TTS) tools to see if they would be useful for converting technical articles to MP3's so I could listen to them on my Pocket PC. 

I wasn't satisfied with the available tools at the time, which often resulted in miserable quality and generally required use of multiple tools to render to MP3.  Now, I'm ready to take a look at the current crop.

Some searching has turned up a decent number of TTS applications:

ReadPlease

ReadPlease has a helpful feature that allows you to add custom pronunciations (e.g. “.NET= dot net“) but seems to lack an option to render to MP3. 

TextAloud MP3

I liked TextAloud MP3.  The interface is a bit, well, VB4'ish, but it gets the job done.  I wish I could customize it like ReadPlease, however. 

Alive Text to Speech

Alive Text To Speech is decent as well, but I don't think the voice synthesis is as good as TextAloud.

Aye Text to MP3

I also liked Aye Text to MP3.  It has a nice clean interface and “just works.“

SayPad

I probably didn't give SayPad enough time, but I quickly realized it wasn't something I really wanted to use.  It might be a good program, but I only used it for a few minutes.

2nd Speech Center - EDITOR'S CHOICE :)

I liked 2nd Speech Center the most.  The interface is good, speech quality is superior, you can add custom pronunciations, choose between many voices and record to multiple MP3 files with one click.  You need to register it to avoid having a nag tag inserted with your MP3 (but I suppose it would be trivial to edit it out.)

Of course, these tools work best when the text has very little code ("...public int three two computesome left parenthesis right parenthesis left brace...".)  I'm willing to "listenread" an article and go back and review the code when I'm in front of a computer again, but only to a certain extent.

While most of the above tools are good for simple book text, they all seem to have problems when too many acronyms are present or try to make letters that should be pronounced individually into syllables of words (e.g. MSBuild as "muzzbuild".)  Not to fault to programs - it is a very hard scenario to detect with code - however, the end result is still quite annoying.  Being able to program custom pronunciations is very helpful, but given the large number of buzzwords and acronyms we have in the .NET world, it would be impractical to add all but the most common.

Any better tools out there?  Is anyone using TTS for technical content?  Please chime in if you have any tips!

-Chris