Long before the invention of
signal processing, some people tried to build machines to emulate human speech. Some early legends of the existence of "
Brazen Heads" involved Pope
Silvester II (d. 1003 AD),
Albertus Magnus (1198–1280), and
Roger Bacon (1214–1294).
In 1779 the
Christian Gottlieb Kratzenstein won the first prize in a competition announced by the Russian
Imperial Academy of Sciences and Arts for models he built of the human
vocal tract that could produce the five long
vowel sounds (in
International Phonetic Alphabet notation: [aː], [eː], [iː], [oː] and [uː]).
 There followed the
acoustic-mechanical speech machine" of
Wolfgang von Kempelen of
Hungary, described in a 1791 paper.
 This machine added models of the tongue and lips, enabling it to produce
consonants as well as vowels. In 1837,
Charles Wheatstone produced a "speaking machine" based on von Kempelen's design, and in 1846, Joseph Faber exhibited the "
Euphonia". In 1923 Paget resurrected Wheatstone's design.
In the 1930s
Bell Labs developed the
vocoder, which automatically analyzed speech into its fundamental tones and resonances. From his work on the vocoder,
Homer Dudley developed a keyboard-operated voice-synthesizer called
The Voder (Voice Demonstrator), which he exhibited at the
1939 New York World's Fair.
Dr. Franklin S. Cooper and his colleagues at
Haskins Laboratories built the
Pattern playback in the late 1940s and completed it in 1950. There were several different versions of this hardware device; only one currently survives. The machine converts pictures of the acoustic patterns of speech in the form of a spectrogram back into sound. Using this device,
Alvin Liberman and colleagues discovered acoustic cues for the perception of
phonetic segments (consonants and vowels).
MUSA was released, and was one of the first Speech Synthesis systems. It consisted of a stand-alone computer hardware and a specialized software that enabled it to read Italian. A second version, released in 1978, was also able to sing Italian in an "a cappella" style.
Dominant systems in the 1980s and 1990s were the
DECtalk system, based largely on the work of Dennis Klatt at MIT, and the Bell Labs system;
 the latter was one of the first multilingual language-independent systems, making extensive use of
natural language processing methods.
Early electronic speech-synthesizers sounded robotic and were often barely intelligible. The quality of synthesized speech has steadily improved, but as of 2016
output from contemporary speech synthesis systems remains clearly distinguishable from actual human speech.
Kurzweil predicted in 2005 that as the
cost-performance ratio caused speech synthesizers to become cheaper and more accessible, more people would benefit from the use of text-to-speech programs.
The first computer-based speech-synthesis systems originated in the late 1950s. Noriko Umeda et al. developed the first general English text-to-speech system in 1968 at the Electrotechnical Laboratory, Japan.
 In 1961 physicist
John Larry Kelly, Jr and his colleague
 used an
IBM 704 computer to synthesize speech, an event among the most prominent in the history of
Bell Labs. Kelly's voice recorder synthesizer (
vocoder) recreated the song "
Daisy Bell", with musical accompaniment from
Max Mathews. Coincidentally,
Arthur C. Clarke was visiting his friend and colleague John Pierce at the Bell Labs Murray Hill facility. Clarke was so impressed by the demonstration that he used it in the climactic scene of his screenplay for his novel
2001: A Space Odyssey,
 where the
HAL 9000 computer sings the same song as astronaut
Dave Bowman puts it to sleep.
 Despite the success of purely electronic speech synthesis, research into mechanical speech-synthesizers continues.
Handheld electronics featuring speech synthesis began emerging in the 1970s. One of the first was the
Telesensory Systems Inc. (TSI) Speech+ portable calculator for the blind in 1976.
 Other devices had primarily educational purposes, such as the
Speak & Spell toy produced by
Texas Instruments in 1978.
 Fidelity released a speaking version of its electronic chess computer in 1979.
 The first
video game to feature speech synthesis was the 1980
shoot 'em up
Stratovox (known in Japan as Speak & Rescue), from
 The first
personal computer game with speech synthesis was
Manbiki Shoujo (Shoplifting Girl), released in 1980 for the
PET 2001, for which the game's developer, Hiroshi Suzuki, developed a "zero cross" programming technique to produce a synthesized speech waveform.
 Another early example, the arcade version of
Berzerk, also dates from 1980. The
Milton Bradley Company produced the first multi-player
electronic game using voice synthesis,
Milton, in the same year.