A friend of mine, a certain Barry, came up with the idea of having a speaker that could read out loud messages that you send it to. Hence the name of the speaker: BarryBox™.

The BarryBox is a speaker that can be directly accessed over MQTT. It uses Google Translate to generate the Text-to-Speech (TTS) audio files, and can thus be used in any language supported by Google Translate. The BarryBox can play any audio stream, which I've limited to sounds from a soundboard. The project as a whole consists of three parts: the BarryBox itself, a frontend and a backend as a bridge between the two.

The backend is designed in such a way that any BarryBox that connects to the MQTT server automatically can be reached through the frontend.

The BarryBox could also be used in a way that does not involve a (public) frontend, e.g. a TTS notifier for a smart home, a speaker that reads aloud tweets from a certain account, a WiFi radio etc. In this post I'll focus on having it read messages sent to it through a frontend.

The system is built s.t. it can do the following:

  • Read text out loud sent to it through a website;
  • Function as a soundboard;
  • Queue messages / sounds;
  • Support multiple BarryBoxes (each with a unique username);
  • Skip through messages / sounds after a button press;
  • Clear the queue after a long button press;
  • Create logs.

System overview

graph TB A(User) --> |Sends message / sound| B subgraph Frontend B[https://barrybox.nl] end subgraph Backend B --> |POST|C[Node-RED] C --> D[(Database)] end C <--> E[MQTT server] E --> F(BarryBox)

Hardware

The device is built around an ESP32 that sends audio data to a PCM5102 Digital Analog Converter (DAC) over I2S. Strictly you do not need an external DAC because the ESP32 has one built in on GPIO25 and GPIO26 (video example). But using the PCM5102 (or a similar external DAC) drastically improves the sound quality, because the inbuilt DAC only has an 8-bit resolution. I used an ESP32 because the ESP32 I2S Audio library I use is better equipped than its ESP8266 counterpart. I will elaborate on that in the software section.

I added a small button to be able to skip audio streams. The circuit diagram is shown below.

Circuit diagram

I soldered some headers on a perfboard and connected them according to the circuit using wires. I connected the button to a terminal to be able to get the board out of its enclosure if needed.

I drilled holes in the plastic enclosure for the button, the AUX audio output and the USB power input. I glued the button against the plastic so that it could be pressed.

ESP32 Software

The BarryBox makes use of the ESP32-audioI2S library. First, I tried to get everything working on an ESP8266, but there was an issue managing Google TTS streams in the audio library for ESP8266.

As with most of my projects, I implemented WiFiManager that allows users to set their WiFi credentials easily. Because every BarryBox has to have a username, I added an extra parameter field. An easy way to do that is explained in their README under Custom Parameters.

The BarryBox can handle three kinds of input: a TTS message, a sound from the soundboard or a link to an audio stream (this one is not available in the frontend). To differentiate between those, I implented a class streamable. Currently the fields are String fields, which are known to be not optimal in terms of memory. I set the maximum size of the queue to be 20. After that, incoming messages are ignored until the queue contains less that 20 items again.

You can find the full code here.

Backend

The backend consists of a database and an API. I built the API in Node-RED, but it is of course also perfectly reasonable to use an alternative (PHP, NodeJS, etc.) My Node-RED flow (import here) looks as follows:

Implementation in Node-RED

It features two HTTP endpoints. Both endpoints require a valid API-key in the api-key field of the request headers.

  • POST: accepts a JSON string representing a TTS message or a sound from the soundboard. The message will be sent to the BarryBox through MQTT.
{
  "client": "<THE USERNAME OF THE BOX YOU'RE SENDING TO>",
  "type": "say",
  "language": "en",
  "text": "Hello world"
}
{
  "client": "<THE USERNAME OF THE BOX YOU'RE SENDING TO>",
  "type": "soundboard",
  "text": "<NAME OF THE SOUND>"
}
  • GET: only requires a specified client as querystring (e.g. https://...&client=jos). Returns the current state (online or offline) and the alias.

The database is used for the following things:

  • Keeping logs of the message sent using the POST endpoint;
  • Keeping track of all usernames (BarryBoxes). If a BarryBox connects to the MQTT server for the first time, Node-RED will add the username to the database table;
  • Storing API keys. People with an API key are authorized to use the backend.

The database has the following tables:

API-keys (value, description)
BarryBoxClients (client, status, alias)
BarryBoxLogs (id, client, type, data, lang, timestamp)

Frontend

The frontend is written in PHP, HTML/CSS and JS (JQuery). All the files in the subfolder soundboard are loaded. Using AJAX the information badges (e.g. online / offline) are updated automatically. The controls are disabled for three seconds aftering sending a message to prevent people from spamming.

You can find the full code of the frontend here.

Frontend Design

Demo

Demo: the BarryBox plays its startup sound and receives a few messages.

8 thoughts on “BarryBox: TTS WiFi Speaker

  1. Hi!
    Is there an alternate to creating a webpage?
    Can the speeches or sound be loaded in another way?

    1. Hi! Yes, the webpage is just a layer on top of the system. The box is controlled over MQTT — you could hook up anything to it.

      1. Thanks for quick reply!
        I can’t seems to find the library for the “streamable.cpp”. Do you remember what library it is included in?

  2. Hi again!
    It is very difficult for me to see what i need to change in your backend code to make it work, can you help me with that?
    Do you have any suggestions on how to change the node-red to make it work without a webpage?
    Thanks!

    1. Hi! Node-RED functions as an interface to the MQTT server. You could hook up anything to it, by making a flow using MQTT nodes. What would you like to hook it up to? Do you have Node-RED running?

      (You could also use Home-Assistant, or the Paho library btw; I used Node-RED because I find it easy to use personally. )

Leave a Reply

Your email address will not be published. Required fields are marked *