A small, fast run-time synthesis library that can be used to deploy robust, high-quality synthetic voices, including (and particulcary) concatenative voices is desirable for a lot of uses. Also, as we are addressing some of the core issues of Festival, we can also consider aspects that were not considered important, or not fleshed out so fully, when Festival was first designed.
These requirements have consequences. Although we are advocating ANSI C to allow more direct control of the code, we are not advocating an abandonment of the object oriented paradigm. We still implement objects in C with appropriate constructors, destructors and methods, but of course without the explicit help of the C++ compiler. Thus, with more control comes more responsibility, as the syntactic scaffolding that C++ provides for object oriented programming is removed.
The next thing to consider was really two-fold: what do we keep from Festival and what do we throw away. To answer this, we need to properly define the run-time environment for Flite. We expect Flite to be running in an constant environment where little changes, thus giving up some of the run-time flexibility of Festival is acceptable. Thus we decided to drop the scripting language, Scheme, from Flite. Although many may initially applaud that, the result is that run-time configuration of low-level system parameters is harder, and more changes require recompilation of the binary object code.
We also want Flite to be closely compatible with Festival. Flite is not a different synthesizer as such, it a library that provides all the routines for a alternative run-time engine, for voices within the existing free software synthesis tool set. Thus we need not only the library, but a clear and, if possible, automatic route for converting voices and models built for Festival to voices and models that can be linked against Flite into synthesizers. Given the voice building tools distributed through the Festvox project we know this is a viable route. Voices can be built and debugged in Festival and, once stable, can be converted to Flite-based voices.
But to be compatible and to allow existing models to simply be compiled, we do need to follow certain key architecture choices when designing voices within Festival. The first is the internal utterance structure. Heterogeneous Relation Graphs [7] were designed specifically to be good for synthesis. An HRG consists of a set of relations, each of which are an structure (e.g. a list or tree) over some set of items. Items may appear in multiple relations and may contain a set of features and values. Thus they are both a good general structure, and they are already being being used in Festival. That structure is preserved in the Flite library, under a completely new implementation.
Importantly, using HRGs means that feature pathnames, which are fundamental to most of the statistical models used in Festival, will be compatible. Feature pathnames in HRGs are a well-defined method for referring related parts of an HRG. Given an item we can use the pathname formalism to refer to relative values around it. Directional control through directives n, p, parent, daughter, etc. allow traversing the current relation while the directive R:RELNAME allows jumping into another relation. For example
n.R:SylStructure.parent.stressis interpreted as moving from the current item to the next item in the current relation, crossing into the SylStructure relation, then moving to the parent item and returning the value of its stress feature.
Pathnames are fundamental to most of the statistical models used in Festival. CART tree question use this mechanism in their questions to refer to what aspect is being questioned. If Flite is to support easy conversion of statistical models from Festival pathname support is right thing to do.