A common interface for text-to-speech providers.

D-Bus services that implement this interface can provide speech synth to applications that use a client library, like libspiel.

For discoverability, the service’s known name must end with Speech.Provider. For example, org.espeak.Speech.Provider.



    Voices readable a(ssstas)

A list of voices the provider can use.

Each voice in the array is structure with the following members:

  • A human readable name
  • A unique identifier
  • Synthesis output format
  • A voice features bit field
  • A list of languages the voice support represented as BCP 47 tags


    Name readable s

A localizable, human readable, name for this provider.



    Synthesize (
      IN pipe_fd h,
      IN text s,
      IN voice_id s,
      IN pitch d,
      IN rate d,
      IN is_ssml b,
      IN language s

This is the basic synthesis method. When called, the speech provider will send the synthesized output to the given file descriptor. Depending on the voice’s advertised format it will be raw audio or composite audio and events.

Providers should be capable of synthesizing more than one request concurrently.

  • pipe_fd: File descriptor of pipe to write to.

  • text: The text to be spoken.

  • voice_id: The voice identifier for the voice that should be spoken.

  • pitch: The voice pitch in which the text should be spoken.

  • rate: The rate in which the text should be spoken.

  • is_ssml: True if the text should be interpretted as an SSML snippet.

  • language: The language the utterance should be spoken in. Some voices support more than one language.