Next: Flite system Up: Flite: a small fast Previous: Motivation

Requirements

A small, fast run-time synthesis library that can be used to deploy robust, high-quality synthetic voices, including (and particulcary) concatenative voices is desirable for a lot of uses. Also, as we are addressing some of the core issues of Festival, we can also consider aspects that were not considered important, or not fleshed out so fully, when Festival was first designed.

portability: : as we expect Flite to run on very small processors, such as in most embedded systems, wearable computers, and personal computing and communication devices, it must be portable - more portable than a C++ codebase allows; thus, voices built upon Flite can be deployed on more systems.
maintenance: : One of the main maintenance issues for Festival is the update of the code to keep it in line with the currently released versions of C++ under a myriad of twisty little compilers, all different - a never ending task. Using ANSI C reduces that maintenance issue.
code size: : C++ is good at hiding access methods from the user but at the cost of often generating more code than is always necessary. Moving to C would give us more control over the size of the code generate.
data size: : most of the size in a synthesis system will lie within the data rather than the code. Festival mostly loads in data into internal structures this requires both the space for the disk footprint and the run-time memory copy. We wanted to avoid the double requirement and have structures would be be used directly avoiding both the time consuming reading and the duplicate memory. We expect much of this data will be in ROM, in some applications.
thread safety: : Although Festival runs on Windows systems, it is still UNIX-centric in its view of memory management. The client/server framework depends of fast forking and copy-on-write memory management for an efficient use; this is not an efficient model under the Windows operating system, nor for smaller operating systems that can be used in embedded systems. The most common question about Festival from Windows develpers is whether it is ``thread safe,'' that is, can multiple threads (execution paths) be run over the same instance of the code. Because of the use of global variables at different places in Festival, it is not thread safe, except on operating systems that implement fast forking and copy-on-write. To make it so would take some work, but in rebuilding a system it is something that can be addressed - as it has been in Flite.

These requirements have consequences. Although we are advocating ANSI C to allow more direct control of the code, we are not advocating an abandonment of the object oriented paradigm. We still implement objects in C with appropriate constructors, destructors and methods, but of course without the explicit help of the C++ compiler. Thus, with more control comes more responsibility, as the syntactic scaffolding that C++ provides for object oriented programming is removed.

The next thing to consider was really two-fold: what do we keep from Festival and what do we throw away. To answer this, we need to properly define the run-time environment for Flite. We expect Flite to be running in an constant environment where little changes, thus giving up some of the run-time flexibility of Festival is acceptable. Thus we decided to drop the scripting language, Scheme, from Flite. Although many may initially applaud that, the result is that run-time configuration of low-level system parameters is harder, and more changes require recompilation of the binary object code.

We also want Flite to be closely compatible with Festival. Flite is not a different synthesizer as such, it a library that provides all the routines for a alternative run-time engine, for voices within the existing free software synthesis tool set. Thus we need not only the library, but a clear and, if possible, automatic route for converting voices and models built for Festival to voices and models that can be linked against Flite into synthesizers. Given the voice building tools distributed through the Festvox project we know this is a viable route. Voices can be built and debugged in Festival and, once stable, can be converted to Flite-based voices.

But to be compatible and to allow existing models to simply be compiled, we do need to follow certain key architecture choices when designing voices within Festival. The first is the internal utterance structure. Heterogeneous Relation Graphs [7] were designed specifically to be good for synthesis. An HRG consists of a set of relations, each of which are an structure (e.g. a list or tree) over some set of items. Items may appear in multiple relations and may contain a set of features and values. Thus they are both a good general structure, and they are already being being used in Festival. That structure is preserved in the Flite library, under a completely new implementation.

Importantly, using HRGs means that feature pathnames, which are fundamental to most of the statistical models used in Festival, will be compatible. Feature pathnames in HRGs are a well-defined method for referring related parts of an HRG. Given an item we can use the pathname formalism to refer to relative values around it. Directional control through directives n, p, parent, daughter, etc. allow traversing the current relation while the directive R:RELNAME allows jumping into another relation. For example

n.R:SylStructure.parent.stress

is interpreted as moving from the current item to the next item in the current relation, crossing into the SylStructure relation, then moving to the parent item and returning the value of its stress feature.

Pathnames are fundamental to most of the statistical models used in Festival. CART tree question use this mechanism in their questions to refer to what aspect is being questioned. If Flite is to support easy conversion of statistical models from Festival pathname support is right thing to do.

Next: Flite system Up: Flite: a small fast Previous: Motivation

Alan W Black 2001-08-26