Using Best-Of-Vox SAPI voices with Linux x64

What device ?

Only x86 Linux since we’ll be using Wine to run a SAPI server.

Under which conditions ?

On Debian Linux Jessie

What to expect ?

An excellent quality Text to Speech (later on TTS) for any textual input, in French (English would do too, but it’s more frequent) without Internet connection, using Best-Of-Vox voices.

What were the pitfalls ?

Wine & XvFB will have to run (but it takes almost no memory nor resident CPU).

Best-of-Vox voices are 39$ in SAPI version, while only 3$ in Android version. Why ?

Why bothering using SAPI and emulating Win32 ?

Following my previous article, we have narrowed down the voice providers to Cereproc and Voxygen. Unfortunately I’ve contacted Voxygen team, and while being very friendly, their commercial linux embedded option is out of my budget. However, since they sell a lot of voices, either via Android or SAPI, and since I’ve already covered how to use Android voices, this times I’ve tried to use x86 based SAPI voices.

Since Intel’s Bay Trail CPU architecture, x64 architecture is accessible with the same performance per watt as ARM devices, and it opens the doors to many existing libraries, and most notably Windows.

Step by step

Step 1 : Build and install Wine

First things, you need to download a recent Wine, since Jessie does not come with the minimum requirement for SAPI for Linux (S4L for short). You also need debug version because using a debugger is required later on.

As the day of writing, I’ve downloaded version 1.7.41.

Because Wine is emulating (yes, it’s emulation) x86 code, you’ll need to install multiarch libraries on your system so 32 bits code runs on your 64 bits system. For debian it’s easy, just type:

$ sudo dpkg --add-architecture i386
$ sudo apt-get build-dep wine

Then goes into your wine archive extracted folder, and run usual autogen/configure/make/sudo make install

Wine is a real mess, build requires many different tricks (at the time I’ve hacked it, I had to google a lot to be able to built it with minimal dependencies)

Step 2: Install S4L

Follow instructions from here.

Update: The website seems down so here’s the link to the file you need to download.

You don’t care about test failing in pyspda.

At the end of the process, you should be able to use sapilektor to make a nice “Microsoft SAM” jingle’s wav file (if not, re-read the installation instructions from the site above, you’ve probably missed something).

Step 3: Install Best-Of-Vox voice

Once you’ve ordered the voice you like in Best-Of-Vox website, you’ll receive an email with a link to download a Win32 .exe installer, and a “coupon” which is what you need to get a license.

Start a X server on your computer (or remotely, as I did), and set the Wineprefix you need:

$ export WINEPREFIX=/home/whatever/.winesapi
$ winecfg 
-> Change windows version to WinXP
$ winetricks winhttp
-> If you don't have winetricks, you'll find it here:
$ wine BOV-SAPI-Helene_7.3.1.4_1.exe
-> This might or might not show a crash dump while installing for a "update.exe" process, but you don't care

When the installation dialog show a “enter license code”, you’ll need to copy the code you’ve received by email after your purchase.

However, the installation will crash/fails if you press “Next”, so you need to “hack” to get it pass this step.

Until you get the “Timeout error” message box, you’ll need to hack with winetricks (typically, default Wine installation lacks many many basic features in the DLL it implements, and you need to download many DLL from microsoft). Winetricks does this for you, unfortunately, I’ve installed many of them, and I don’t know which one are absolutely necessary. Here’s what I have in my wineprefix:

$ ./winetricks list-installed

-> install with "winetricks wininet winhttp wsh56vb etc..."

If you get a “Message 42” or any other error than “timeout”, then play with winetricks. You can not continue further until you get the “Timeout” error message box.

I’ve unpacked the installer, and disassembled the script. The installer tries to post many system specific information to Unfortunately, wine default implementation for socket does not map 1:1 to what the installer expect, and it timeout, so you can’t pass this last step of installation.

The returned data from the server is a license file (a .lic file) that’s absolutely required to get the SAPI voice to work.

So you need to hack it, and here’s how:

Start installation up to serial enter code dialog, enter it, but don’t press “Next” button.

In another terminal, enter these commands:

$ winedbg
Winedbg> info process
pid threads executable (all id:s are in hex)
 00000010 1 'explorer.exe'
 0000000e 5 'services.exe'
[...] // Look the pid for  BOV-SAPI...7.4.1.tmp *tmp is important*
Winedbg> attach 0x23 // Use PID from the list above
0xf77a8d5e __kernel_vsyscall+0xe in [vdso].so: int $0x80
// On the installation windows, press "Next" button, and quickly press Ctrl+c in this terminal
Winedbg> <Ctrl+c>
Ctrl-C: stopping debuggee
0xf774dd5e __kernel_vsyscall+0xe in [vdso].so: int $0x80
Winedbg> bt
// Walk up with "up" command until you get into the socket code
Winedbg> up 
Winedbg> list
ret = recv(fd, msg, len, flags);
Winedbg> fini
Winedbg> set ret = 1 // This actually replace the timeout return code from recv to a success code
Winedbg> c

Voilà, you should now get the last message from the installer dialog, and mainly, the license file was created, and registry stuff is installed by the installer so the SAPI voice is working.

In my case, Helene voice was listing with “sapiconfig” command, so I only needed to validate it with “sapiconfig -s” and I was able to use in with sapilektor & sapitest too.

Without this hack (that is, by interrupting the installer abruptly at license page), when using the Helene voice, SAPI failed with an 0x80045012 error (exception in SAPI engine).

Also, if the above does not work for you, you can replace the “Ctrl+c” sequence while waiting in winedbg by:

Winedbg> b WinHTTPSendRequest
Winedbg> c
// enter the coupon code and when it breaks in the debugger:
// Inspect the stack
Winedbg> x/1x $ebp+0x14
 0017c65c  // This will change for you
Winedbg> x/1024c 0x0017c65c // Use the number from your output
0x0017c65c: p r o t o c o l v e r s i o n = 
0x0017c67c: 1 & p r o d u c t = B O V _ S A 
0x0017c69c: P I & e n g i n e v e r s i o n 
0x0017c6bc: = 7 . 3 & u i d = x x x x x x x 

Copy the last part in a text editor, remove all useless spaces (so you get “protocolversion=1&product=…”)

Then use curl to POST to “; with the data above. If you are very careful, you’ll get back a license file you can save in your installation folder “\voices\Helene\licence\voxygen.lic”

This is very painful to get right, so the “replace the return value from recv” method is much easier.

Good luck.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s