The Raspberry Pi Powered Speaking Doorbell – Part 3: Text to Speech

In Part 1 we looked at a simple input circuit to isolate our Raspberry Pi from our doorbell circuit and in part 2 we looked at making a camera overlay appear in Kodi. Next, we’ll look at building the text to speech server.

Please note that the following blog post uses code snippets from my Github project. You will need to clone or download the full source code to run the examples.

With my home setup, I have a dedicated media PC in the lounge which runs Kodi on Windows. It is connected to a Yamaha receiver which is permanently on. The doorbell circuit, however, is connected to a Raspberry Pi. In my case, it makes sense to have the audio for text to speech play over the media PC. But how do we trigger text to speech on the media PC from the Raspberry Pi when someone presses the doorbell?

To solve this problem, I built a simple text to speech handler using the Tornado Web Server – this web server runs on the media PC in the lounge. When the doorbell switch is pressed, the Raspberry Pi simply performs an HTTP request to the text to speech server, which then outputs the given text as speech over the Yamaha receiver.

from lib import handler
import pyttsx

class TextToSpeechHandler(handler.Handler):
    def post(self):
        text = self.get_argument('text')

        engine = pyttsx.init()

        engine.setProperty('rate', self._registry['config'].getint(

        engine.setProperty('volume', self._registry['config'].getfloat(

        voices = engine.getProperty('voices')
        for voice in voices:
                    'text_to_speech.voice').lower()) != -1:


We define a handler “TextToSpeechHandler” which accepts HTTP posts and will convert an argument called “text” to speech. The handler inherits from my base Handler class (which contains some functionality which is common to all my handlers), which in turn inherits from the standard Tornado handler.

For text to speech, we’ll use the pyttsx package. I have made 3 parameters configurable here – The rate, which is how fast the text is spoken, the volume, and the voice to use (it performs a partial text match on the voice configured). I have the following configuration set up:

#System finds first voice ID that contains the below text, case insensitive
text_to_speech.voice = Hazel

#Speed of speech. 100 is "normal" speed
text_to_speech.rate = 130

#Volume. 1.0 is full volume, 0.0 is no volume.
text_to_speech.volume = 1.0

An example script that performs an HTTP post to the text to speech server:

from lib.bootstrap import Bootstrap
import requests

bootstrap = Bootstrap('default', ['config', 'log'])
registry = bootstrap.bootstrap()

text_to_speech_hosts = registry["config"].get('text_to_speech.hosts')

text = 'There is a visitor at the front door.'

for host_and_port in text_to_speech_hosts:
    url = 'http://' + host_and_port  + '/text_to_speech'
    payload = {"text": text}, payload)

We define the text we want to convert to speech in the “text” variable. Then, for all text to speech servers that are configured, we perform an HTTP post with the text as a JSON encoded string.

In the next part, we’ll look at some other utility libraries before digging into the actual doorbell code.

Leave a Reply

Your email address will not be published. Required fields are marked *