How to Use git diff with an sqlite3 Database

If we have a git repository where there is a binary sqlite3 file, it is difficult to see what changed by only using a git diff command. But there is a neat git trick (learned from here) we can use to overcome this: we use git’s built-in textconv mechanism to dump the sqlite3 database on-the-fly so that we can use the standard git diff to see the changes.

The Problem: an sqlite3 Database is a Binary File

sqlite3 is a small but powerful database that, unlike its more powerful counterparts like MySQL and PostgreSQL, stores its data in a single binary file. And just like its more powerful counterparts, we can use the Structured Query Language (SQL) to query the database.

There are times that an sqlite3 database file is committed to a git repository. For example, a customer was using an internal tool that stored a small user database in a sqlite3 database file. This file was committed to the repo because it was a good place to keep it. This worked great when only a single developer was working on the database, but started getting tricky once more than that single developer was operating on the file.

Normally, git is really good at figuring out and displaying what changed in a file, but only for text files – not for binary files. And sqlite3 files are indeed binary files. So in this particular case, it became more and more difficult to handle changes that were done to this database.

But there is a neat little trick we can use to convert them to text files on-the-fly.

The Trick: Dump the sqlite3 Database on-the-fly

The trick is rather simple: git has built-in mechanisms to convert binary files into a format that can be used for diffing and patching. If we set them up correctly, we can use git diff and git log -p and other niceties to see exactly what changed inside those binary sqlite3 files!

Here’s how:

We need to define a custom diff mechanism, which we simply call sqlite3. We do this by issuing this git config command:

git config diff.sqlite3.textconv "sh -c 'sqlite3 \$0 .dump'"

This will create an entry in our repository’s .gitconfig file. If we want, we can also set this configuration for all repositories, like this:

git config --global diff.sqlite3.textconv "sh -c 'sqlite3 \$0 .dump'"

This will instruct git diff to pipe its files through the sqlite3 command and its .dump operation, which simply dumps its entire database as a big SQL statement. The output if this operation is then used as input to the normal git diff mechanism.

If we look into our .git/config file we should now see something like these lines in there:

[diff "sqlite3"]
    textconv = sh -c 'sqlite3 $0 .dump'

Those two lines were added when we ran the git config command above.

Next, we need to tell git that we want to use this diff mechanism for sqlite3 files. There is no prescribed extension for sqlite3 files, but .sqlite3 is a common one, so let’s use that. We should add the following line to our .gitattributes file for our repository:

*.sqlite3 diff=sqlite3

This tells the git command that it should use our newly defined sqlite3 diff mechanism for all files that match the *.sqlite3 pattern.

And that’s it! Let’s look at an example.

An Entire Example

We start by creating a new directory and an empty git repository:

mkdir new-directory
cd new-directory
git init

And we configure it with the mechanism from above:

git config diff.sqlite3.textconv "sh -c 'sqlite3 \$0 .dump'"
echo '*.sqlite3 diff=sqlite3' > .gitattributes

Next we create an sqlite3 database. We do this using the sqlite3 command line tool and its interactive mode:

sqlite3 database.sqlite3

We are now in interactive mode so let’s create a small database:

CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    email TEXT NOT NULL
);

And, while still in interactive mode, we add some users into our database:

INSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com');
INSERT INTO users (name, email) VALUES ('Bob', 'bob@example.com');

We can now use the SELECT statement to see that they are there:

SELECT * FROM users;

Which should give us the output:

1|Alice|alice@example.com
2|Bob|bob@example.com

We exit the interactive sqlite3 mode with:

.exit

Now we can commit our new database to our git repository:

git add database.sqlite3
git commit -m "A new database"

Now we have a git repository with a binary file that contains our small database. Let’s add another user and see what happens when we run git diff on our repo.

We enter interactive mode again:

sqlite3 database.sqlite3

And run:

INSERT INTO users (name, email) VALUES ('Charlie', 'charlie@example.com');

We then exit interactive mode:

.exit

Now let’s see if we can do a git diff:

$ git diff
diff --git a/database.sqlite3 b/database.sqlite3
index f56ddeb..6b1cfe5 100644
--- a/database.sqlite3
+++ b/database.sqlite3
@@ -7,4 +7,5 @@ CREATE TABLE users (
 );
 INSERT INTO users VALUES(1,'Alice','alice@example.com');
 INSERT INTO users VALUES(2,'Bob','bob@example.com');
+INSERT INTO users VALUES(3,'Charlie','charlie@example.com');
 COMMIT;

Oh look! We see that there is a new user entered into our database – just what we would like to see.

If we commit this change to the git repo, we can even use git log -p to see exactly what changed between revisions. Note that we can’t use git add -p to add our changes: git will complain about how only a binary file changed. But this is probably for the best, as we wouldn’t want to commit partial diffs in a situation like this anyway.

Conclusions

sqlite3 is a powerful and handy database that people use for a lot of different things. While committing sqlite3 files to a git repo may not always be the recommended way to work, it is something that happens from time to time. The trick presented here is a neat way to deal with the situation, in case it happens.

How to Use git diff with an sqlite3 Database

The Problem: an sqlite3 Database is a Binary File

The Trick: Dump the sqlite3 Database on-the-fly

An Entire Example

Conclusions

Written by Adam Dunkels, PhD