Adding token offsets to SQLite syntax errors

If you've used SQLite's shell before, you might be used to errors like this when there's an issue with the syntax of the query.

Parse error: near "(": syntax error
  eated timestamp NOT NULL DEFAULT current_timestamp(),   lastchange timestamp N
                                      error here ---^

You might still be a little confused why that's a syntax error (in MySQL it's a function, in SQLite it's a just a keyword) but you're shown where it is. In the event of the input, which was a CREATE TABLE statement, you could have been thinking that the issue is with the opening bracket for CREATE TABLE ... (.

What if we pass the same input to SQLite through their C interface rather than the shell?

near "(": syntax error

It's a lot less helpful. How can we get more information so that we can reproduce the shell's behaviour?

In SQLite 3.38.0, support was added for sqlite3_error_offset.

If the most recent error references a specific token in the input SQL, the sqlite3_error_offset() interface returns the byte offset of the start of that token. The byte offset returned by sqlite3_error_offset() assumes that the input SQL is UTF8. If the most recent error does not reference a specific token in the input SQL, then the sqlite3_error_offset() function returns -1.

If we add this to our syntax error messages, we can use that offset to find the token in our query. In the case of Cloudflare D1, this change needs to be made in workerd which is the open-source runtime powering Cloudflare Workers.

diff --git a/src/workerd/util/sqlite.c++ b/src/workerd/util/sqlite.c++
index 51d06dee3cb..711edc75322 100644
--- a/src/workerd/util/sqlite.c++
+++ b/src/workerd/util/sqlite.c++
@@ -51,14 +51,24 @@ namespace workerd {
     int _ec = code; \
     /* SQLITE_MISUSE doesn't put error info on the database object, so check it separately */ \
     KJ_ASSERT(_ec != SQLITE_MISUSE, "SQLite misused: " #code, ##__VA_ARGS__); \
-    SQLITE_REQUIRE(_ec == SQLITE_OK, sqlite3_errmsg(db), " " #code, ##__VA_ARGS__); \
+    kj::String msg = kj::str(sqlite3_errmsg(db)); \
+    int offset = sqlite3_error_offset(db); \
+    if (offset != -1) { \
+      msg = kj::str(msg, " at offset ", offset); \
+    } \
+    SQLITE_REQUIRE(_ec == SQLITE_OK, msg, " " #code, ##__VA_ARGS__); \
   } while (false)
https://github.com/cloudflare/workerd/pull/1392

The implementation here uses a macro that wraps the call to SQLite for error handling & reporting SQLITE_MISUSE. The error message is obtained from SQLite via the sqlite3_errmsg API. In the event that sqlite3_error_offset returns -1 then the error was not caused by a specific token, so we won't add the offset information to the error.

With this additional information, tooling can identify the token's location in the input. At the beginning of this blog, the issue was with the usage of current_timestamp() so let's feed that query into D1 again.

near "(": syntax error at offset 277

If we substring the query, giving us 25 characters either side of the token and removing newlines, we can reproduce the shell's behaviour.

> const sql = `CREATE TABLE accounts (
...   id tinytext NOT NULL,
...   username varchar(32) NOT NULL,
...   disabled tinyint(1) NOT NULL DEFAULT 0,
...   email varchar(255) NOT NULL,
...   password tinytext NOT NULL,
...   oplvl tinyint(4) NOT NULL DEFAULT 0,
...   created timestamp NOT NULL DEFAULT current_timestamp(),
...   lastchange timestamp NOT NULL DEFAULT current_timestamp(),
...   token tinytext NOT NULL DEFAULT '',
...   session tinytext NOT NULL DEFAULT ''
...   );`
> console.log(sql.substring(277 - 25, 277 + 25).replace(/\r?\n|\r/g, "")); console.log('-'.repeat(25).concat('^'));
DEFAULT current_timestamp(),  lastchange timestam
-------------------------^

This behaviour isn't currently available in Wrangler as the change to workerd is yet to be released, but it will be soon!