Flite is the core library. For synthesis, this library require three further three parts to make a complete synthesizer
Unlike Festival, voice definitions are explicitly attached to each utterance as it is created. In Festival there is a notion of a ``current voice'' accessed through a global variable, which is not thread safe. In Flite, all top-level synthesis routines require a voice as an argument, which is then attached as a feature to each create utterance. A voice definition includes the definition of how synthesis is to proceed. This is specified as a C function which calls the necessary sub-functions of tokenization, lexical access, prosody etc. This means voices themselves can specify what steps are required for rendering text as speech. Although Festival could support such a model, it does not by default.
A voice definition consist of a set of feature value pairs setting voice specific aspects such as models for prosody, unit select database to use, lexicon etc. The equivalent in Festival is not so neat (though this model was discussed as a method for Festival at various times).