Data logging engine

In older versions this application used neDB for storing historical metric data logs. Since neDB keeps all data in memory, it gets slower over time as data grows. Indexes help a lot, but really, this was not a good way to store millions of data points.

I tried implementing a binary search for neDB to get the graph data, but it wasn’t making the cut, the fundamental storage engine of neDB was just too slow for this purpose and the performance was not a lot better, so I ditched that effort. I researched other methods to store timestamped sensor data. There are expensive enterprise engines and also open source toolchains like Apache Hadoop – nice but complex and a steep learning curve (Pi support?). There’s mongoDB which is wonderfully capable and similar to neDB (or neDB similar to mongo rather) but still not officially supported on the Pi so it’s a no-go for now.

Then there’s Timestore, a C++ app by Mike Stirling, which he pointed out to me after he released it and I was very excited that a fast small standalone IoT storage engine exists for the rest of us. It stores data in binary format at fixed intervals (so no timestamps stored, just a starting point, and each data point value). During extraction, each data point will have a calculated timestamp based on the seed time. It uses binary search tree, a fast search algorithm to query and only return a reasonable # of data points. It also supports storing averaged values and has other features. This is all great but I have an important requirement: I cannot assume when a data point comes in to be stored, it can be highly variable and I can have a thousand points in 1 hour, then nothing for a week. So I need to store the timestamp. Turns out the guys over at OpenEnergyMonitor have done a lot of work based on Timestore. They had the same problems to solve, getting a lot of data out of log databases, queried/aggregated and returned as fast as possible. Their EmonCMS app used MySQL to log sensor data, which was increasingly slow as tables got very deep. So they created PHP storage variations based off Timestore to allow logging variable interval data. Thanks guys for sharing your work and research results!

After considering all these options, I decided to build my own node.js storage engine for logging variable time based sensor data based on bits and pieces from Timestore and the variable interval engine from OEM. The main reason is that it needed to be all in node.js. Doing it in PHP and then hitting a PHP script from a node.js socket app to query/post data felt very wrong (not sure how performant either).

The result is a much faster binary storage engine that can query and aggregate weeks worth of data in ~500ms (on a Pi B Rev1 256RAM!) and ~80-200ms on a Pi 2 (quad 900mhz, 1GBRAM), and much faster on Pi3. That’s more than a magnitude faster than neDB which was increasingly slower as data accumulated.

Upgrading from old neDB gatewayLog.db data

If you’ve used my Gateway interface before this new storage engine was implemented, you probably have log data stored as JSON in /home/pi/gateway/gatewayLog.js. To upgrade just save your existing gatewayLog.db, upgrade your Pi to the latest release, copy the gatewayLog.db into /home/pi/gateway/data/db, then run this upgrade script in the same directory – it parses your gatewayLog.db data and puts it into binary files in a db subdirectory, ready for the new gateway.js script to use. Otherwise gateway.js will just create logs from scratch as data comes in.