What device ?
Any Linux supported device that is able to run Android – x86, ARM, MIPS
Under which conditions ?
On Debian Linux Wheezy (or Android, ARM preferred)
What to expect ?
An excellent quality Text to Speech (later on TTS) for any textual input, in French (English would do too, but it’s more frequent) without Internet connection.
What were the pitfalls ?
Usual solution have a very poor quality whenever they work.
1) Analysis of existing solutions
For anyone interested in TTS solution running on Linux, a little bit of google time will return:
E-speak is a synthesis solution that generate “speech” based on physical throat model (larynx, tongue, etc…). The speech is terrible, completely robotic, not understandable, not acceptable. However, E-speak provide a very useful text to phoneme conversion.
MaryTTS is the most promising solution (the speech quality is much better than E-speak, but still very far from a commercial solution), but its architecture server/client with Java will limit the casual hobbyist. Technically, this project is the one that gave the best speech from all open source engines I’ve found so far.
Festival or Flite does not worth any criticism, it’s so bad in all aspect that one should not even try it.
PicoTTS is a “open <crappy> source” solution when Google, in their infinite wisdom, decided to have an open source solution for their Text to speech technology. The quality is not that bad, however the source code is “take it as it is”, that is, you have a completely fixed source, that’s almost un-modifiable since the engine source code is just a “player” of the voice database which is not documented, not described.
Later Android revision changed the engine to something better (in fact, PicoTTS was formally written by SVOX, which in turns was bought by Nuance. Nuance is well known for aggressively eating their competitors to remove any competition from the market)
2) Other options
If you check the available TTS vendors, you’ll find very very good engines and voices from Cereproc, SVOX, Neospeech, Voxygen (my 4 best choice).
It’s very important to understand the amount of effort required to build a (perfect) voice from scratch, if you don’t, then you’ll not understand the next paragraph.
Making a voice requires a very constant reading of a text (no emotion, as pleasant as possible), for numerous hours (12h is a minimum).
Then you must segment each phoneme (you can have a software that “presplit” for you, but the result is usually bad), and let a computer make a model of your voice (typically two models will be build, one is called “unit selection”, and the other is called “hidden Markov model”. The former is used for casual words, the sound is perfect, the later is used for unknown words, the sound is a little bit worst.
I’ve contacted them, and asked for a quotation. They were all very friendly, and if I had to start a professional solution, I would really boot up from their work, the price they have asked for is not illogical compared to the work achieved.
Cereproc has a REST and SOAP based solution for TTS that’s very easy to use (2h of work to get it working). Their price is not that high, if you don’t intend to read a book. However, you need an internet connection to run it, as all other Cloud services.
However, I’m no professional, so I have to deal with my brain to solve my problems.
3) Chosen solution
Anyone using an Android phone will see the number of available voices (the engine is always free on Android, why ?) for almost free.
I wondered if it was possible to use these voices on a usual Linux system. You first need to purchase them (usually less than $4 per voice) to download them on your Android phone.
Then the hacking starts…
3.1) Backup of the voice engine and data
It’s usual for Android applications to have a locked state that prevents you, the content owner, to backup the application for analysis. It’s made to prevent copying, but as most DRM, it’s useless because any DRM must provide a way to read the content (else it’s just called “noise”). Typically, you’ll have to find a rooted phone/tablet (they are legions), navigate to the application folder and save it to a SD card for later analysis.
3.2) Disassembly of an application
They are numerous guides about Android APK disassembly. An APK is a zip file, and the jar file is DEX encoded so you need a DEX2JAR tool to get back a usual jar file.
Then you can use JD to decompile the code to Java’s source code (minus the variable name)
Voice synthesis engines usually build their engine in C and do a binding to Java via the JNI protocol. Their engine is delivered as a dynamic library (a .so file) and some Java glue code is used to link the Android’s API to their internal API.
This library is compiled for ARM platform (most of the time, I’ve yet to see a TTS working on a x86 platform on Android).
If the application requirement is 2.3 (Gingerbread), then the .so file will have to run on ARMv5 (else ARMv7 might be the minimum supported version).
3.3) Using the engine.so file
Typically, the .so file can not be used on a ARMv5 linux system directly because:
- Android dynamic linker is completely different than Glibc’s linker
- Android does not use Glibc but a minimal libc version (called Bionic)
- You don’t have the header for the library file
In order to solve 1 and 2, you might use a project called libhybris.
libhybris emulates the Android’s linker and specificities of the Bionic C library.
To solve 3, you’ll need to write a minimal header yourself based on the decompiled Java’s source code you got from step 3.2 (and a bit of “objdump -tTC engine.so”)
Typically, in C, you don’t need to know the type of the structures as long as it’s an opaque container and you only deal with pointers. You’ll write code like this:
struct MyStruct; struct MyStruct * engine_create(); // If engine_create is an exported symbol from the library and the JNI wrapper seems to call it this way int get_sample_rate(struct MyStruct *, int channel); // etc...
3.4) Let’s put that together
So, you have written a minimal header. You’ll need to write a bridge from this header to the engine.so file you got from the Android package. You’ll need the Android NDK.
You might want to follow those steps.
If you do everything right, you’ll get something that compiles and links (quite hard by itself).
Then you run it, and … it fails…
You’ll then have to understand why, where, how.
The most likely reason is a bad header declaration, linking that does not find the expected files at the right place, etc… GDB is your friend, but understanding the concept here is not for the faint of heart, build libhybris with debug symbols, it’ll be more clear.
I’ve not succeed getting the engine.so to work with this solution (after having fixed all the dynamic linking errors and Bionic C library’s function fixing). It seems that the engine’s functions seems not to work as described in their JNI’s wrapper, or it calls something that’s specific to Android that’s not “fixed” by Libhybris yet.
3.5) Fallback solution
Since using a patcher for a library is tricky by itself and (in my case) does not work, I decided to run a direct call from the library on Android itself. I’ve a Beagle Bone Black at home, so installing and running Android is very easy, does not cost much more than the price for the BBB (electricity does not count, the BBB is always on on my system).
I’ve made a minimal wrapper code around the library (the same I was using with libhybris solution), built with the NDK with this makefile:
:::::::::::::: Makefile :::::::::::::: AR = arm-linux-androideabi-ar AS = arm-linux-androideabi-as CC = arm-linux-androideabi-gcc CXX = arm-linux-androideabi-c++ LD = arm-linux-androideabi-ld NDK_KIT = /home/user/android-ndk-r9d/ PLATF_KIT = platforms/android-9/ ARM_INC = $(NDK_KIT)/$(PLATF_KIT)/arch-arm/usr/include ARM_LIB = $(NDK_KIT)/$(PLATF_KIT)/arch-arm/usr/lib OBJS = main.o EXES = test test: main.o $(LD) \ --dynamic-linker /system/bin/linker -nostdlib \ -rpath /system/lib -rpath $(ARM_LIB) -rpath-link $(ARM_LIB) \ $(ARM_LIB)/crtend_android.o $(ARM_LIB)/crtbegin_dynamic.o \ -L$(ARM_LIB) -lc -L. -lengine -l$(NDK_KIT)/toolchains/arm-linux-androideabi-4.6/prebuilt/linux-x86_64/lib/gcc/arm-linux-androideabi/4.6/libgcc.a -o $@ main.o main.o: main.c $(CC) -I $(ARM_INC) -g -c main.c clean: rm -f $(OBJS) $(EXES)
And then run the produced binary on the Android system… and it worked!
Then I’ve made a TCP server with it, and as far as I’ve tested it, I’ve a TTS engine with excellent voice quality in my langage that works on Linux.