Hello SQLite !
This November, we decided to open source one of our internal libraries to the world: the driver we use for data persistence (but not only!).
This library has been a core component of our software stack, and after using successfully the package internally for the last 18+ months, we think the API has stabilized enough, that many of the initial bugs have been sorted out, and we are happy to share the code more widely.
This not-so-short post tries to provide a high-level overview of the library, and maybe see if those ideas would help you in your project.
Design Decisions in the driver
SQLite provides a rock-solid engine for data storage: not only is it the most commonly used database in the world, new community extensions such as Litestream have turbocharged its abilities to serve as a fully functional engine in all cases.
Inspired by D.Crawshaw’s talk , we wanted to explore a different, lighter approach to persistence than the sql package of the standard library.
Goal 1: robustness
Persistence is the part of the code you can’t fix after the fact, so stability over anything else is pervasive through the code base.
At the bottom layer, we heavily rely on the built-in durability mechanisms of the engine (so much so that Close was a late addition to the library). We also chose a set of compile-time options to focus on modern good practices (multi-core CPU, pooling built-in, only accept standard SQL), and prevent API misuse – at a small runtime cost.
At the driver layer, we chose to restrict mapping between Go and SQLite types to a minimum, as to avoid surprises (e.g. time.Time is converted to the expected string format, time.Duration is not).
Goal 2: fluent interface
Go is often derided for verbose error management. This can be alleviated by storing computation state in structures (and maybe in the stack in the future!!), and threading it through methods. In this library, querying a record needs only a single error check, as in:
Goal 3: lightweight transaction management
SQLite offers a powerful savepoint capability that offers nested transaction abilities. The library leverages by mapping it to (nested) contexts that can be passed between queries, and naturally work with the language stack:
Functions that access the database with Exec method can then naturally be reused and composed in multiple transactions – the context will take care of assigning them to the right timeline.
Goal 4: rich debugging
SecurityHub provides a captive shell available over SSH (more on this in the third part of this post), and in development mode this grants direct access to the database. With a live connection to the system, one gets an experience similar to what is provided by client-server databases; including detailed performance plans and counters to peek at.
Building a storage layer from the driver
Our application storage is based on a key-value model: each entity in the system is serialized (in our case using cbor ), and stored as bytes in a two-column table (aptly named key and value ). This setting means fetching an entity is a short invocation (FindOne is a small wrapper to execute a query and scan the result).
The convenience offered by this model is undeniable for programmers but comes with risks: a change to a data structure used in code could indeed prevent us from reading older values. We use a simple build-time script (included in cmd/stability ) to alert developers early if the data structure they marked as serialized (see examples in the sqlite/stability package) changed, and therefore could trigger backward-compatibility issues.
We are also using SQLite as a document format for data interchange, and even in internal projects when we need a rich cache layer! Each of those projects uses a dedicated storage layer, probably a topic for a later installment (this post is already rather long).
Getting sucked in the virtual table blackhole
SQLite can call arbitrary user-level code, either through functions , or even virtual tables . Using a sprinkle of runtime reflection, this package provides high-level wrappers for both; usually making creating a virtual table a ~50 lines affaire.
Lowering the cost of exposing Go code as SQL constructs has dramatically improved our code practices:
- Modules usually expose one or more structured tables (uncorrelated from the underlying storage), and HTML pages are built via templates populated from SQL queries (also known as “PHP, the right way”)
- Arbitrary complex functions are easily combined into queries (such as, say, left-joining results from a natural language tokenizer to a ML model evaluation)
- Over our captive shell, virtual tables are an API, but with the productivity of a REPL environment powered by SQL queries
The code behind this feature is itself a testament to the expressivity you get with Go: when a new structure (or function) is registered, a virtual machine is built based on static information. Then the machine performs all bridging operations with C – and the benchmarks look good.
Maintaining a public package is a high commitment (especially for small companies): accept responsibility for fixing bugs quickly, provide a quick and friendly response to feature requests, and maintain API stability, … This explains why we are careful in the pace at which we share technology from our internal code base.
We think this package could provide a data point to think about designing storage layers (with, or without virtual tables) in a post-ORM world, and we are looking forward to seeing how other projects approach those problems.