Long-Term Data Storage of ESPHome Sensor Data
Written on June 15th, 2023 by Kevin AhrendtIntroduction
I currently have 26 devices running ESPHome that collect data from our home. The various devices include environmental sensors like air pressure, temperature, humidity, and levels for CO2, VOC levels, and particulate matter. Other devices collect power/energy usage using smart plugs or a whole house power monitoring device (Emporia Vue 2 with custom component). Altogether, these devices generate a significant amount of raw data. That data is partially filtered and sent to my Home Assistant server. Home Assistant then stores the filtered data for about ten days. After that time, Home Assistant only keeps aggregates like the mean, minimum, and maximum for 1-hour intervals. I plan to collect all this measurement data in a separate database for permanent storage. I will retain raw data for one month and permanently retain all data aggregated over one-minute intervals.
Possible Approaches
There are several established ways to accomplish this goal.
- Run InfluxDB (or VictoriaMetrics) and use Home Assistant’s component to use it for long-term storage. There are many tutorials available for this setup, for example here or here.
- Advantages:
- Relatively easy to setup and use
- InfluxDB can automatically downsample the data after a set time.
- Widespread community use, so many resources available for help
- Disadvantages:
- Requires Home Assistant to be running (generally not a problem, as restarts are rarely needed)
- Requires all data to be passed through Home Assistant, so for more complete data, all sensors should have a short update interval. I generally do not need my Home Assistant instance to have such small update intervals for automations, so unnecessary data is being sent only to be passed on to InfluxDB.
- Even with a short update window, the sent data could easily miss a maximum value if measured between updates.
- Not all data sources (not necessarily smart-home data) can easily be integrated into Home Assistant to be passed onto long-term storage.
- Relatively easy to setup and use
- Advantages:
- Use a data scraping program like Telegraf to collect data directly from the device and send it to a database.
- Advantages:
- Removes Home Assistant from the long-term data collection process.
- Telegraf can aggregate incoming data and provide typical summary statistics like minimum, maximum, and mean.
- Disadvantages:
- Data is collected at a set interval (the default is five seconds). Any summary statistics are limited to one reading every five seconds, so it could easily miss a maximum value if measured between updates.
- Collecting data directly from ESPHome is challenging. There are several options.
- Use the Prometheus ESPHome component to allow Telegraf to scrape sensor measurements
- Requires using the Arduino core instead of the (generally) better-performing ESP-IDF framework
- Adds significant overhead on the ESP device
- Setup an MQTT server like Mosquitto and configure ESPHome to send data to it using the MQTT component
- Increases complexity.
- Cannot specify which sensors use the native Home Assistant API and which use MQTT.
- Overhead on the ESP device is minimal.
- Use the HTTP Request Component to send data directly to the database.
- Tedious configuration for a large number of sensors
- Adds significant overhead on the ESP device
- Use the Prometheus ESPHome component to allow Telegraf to scrape sensor measurements
- Advantages:
Current Plan
I will use a variation of the second option and overcome the listed disadvantages with some custom code. The long-term data will be stored using a VictoriaMetrics cluster[https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html]. The VictoriaMetrics cluster will have two separate storage instances running; one for high-frequency data retained for one month, and the other will store aggregated data at a lower frequency for permanent retention. Data will be collected using Telegraf’s MQTT consumer plugin from a Mosquitto MQTT Broker. The ESPHome devices will aggregate and send data over MQTT for long-term storage.
![[Data Retention Flow Chart.png]]
Flow chart showing where data is sent from an ESPHome Device
We can break this down into three steps:
- Configure VictoriaMetrics and Telegraf for data collection and storage.
- Modify ESPHome’s MQTT code to restrict which sensors send measurements via API or MQTT
- Sensors with a long update interval are exposed to only Home Assistant.
- Sensor measurement aggregates are sent only via MQTT.
- Develop a statistic component for ESPHome that aggregates the data over a sliding window quickly and efficiently.
- ESPHome has sensor filters built-in for the minimum, maximum, and mean (among other statistics) over a sliding window. These work well for one or two sensors, but if we want aggregated data for many sensors, this quickly has issues.
- Each aggregate requires a Copy component with the appropriate filter applied. It is tedious to configure several different aggregates for many different source sensors.
- The algorithms for finding the aggregates are inefficient in memory and computational time. Using the built-in filters to compute statistics like minimum, maximum, and mean for several sensors over large sliding windows can crash the device.
- ESPHome has sensor filters built-in for the minimum, maximum, and mean (among other statistics) over a sliding window. These work well for one or two sensors, but if we want aggregated data for many sensors, this quickly has issues.
Future
I will write future posts describing my implementation of each of these steps. I will heavily focus on describing the custom statistics component I have developed that is computational and memory efficient.