If we have a git repository where there is a binary sqlite3 file, it is difficult to see what changed by only using a git diff
command. But there is a neat git trick (learned from here) we can use to overcome this: we use git’s built-in textconv mechanism to dump the sqlite3 database on-the-fly so that we can use the standard git diff
to see the changes.
The Problem: an sqlite3 Database is a Binary File
sqlite3 is a small but powerful database that, unlike its more powerful counterparts like MySQL and PostgreSQL, stores its data in a single binary file. And just like its more powerful counterparts, we can use the Structured Query Language (SQL) to query the database.
There are times that an sqlite3 database file is committed to a git repository. For example, a customer was using an internal tool that stored a small user database in a sqlite3 database file. This file was committed to the repo because it was a good place to keep it. This worked great when only a single developer was working on the database, but started getting tricky once more than that single developer was operating on the file.
Normally, git is really good at figuring out and displaying what changed in a file, but only for text files – not for binary files. And sqlite3 files are indeed binary files. So in this particular case, it became more and more difficult to handle changes that were done to this database.
But there is a neat little trick we can use to convert them to text files on-the-fly.
The Trick: Dump the sqlite3 Database on-the-fly
The trick is rather simple: git has built-in mechanisms to convert binary files into a format that can be used for diffing and patching. If we set them up correctly, we can use git diff
and git log -p
and other niceties to see exactly what changed inside those binary sqlite3 files!
Here’s how:
We need to define a custom diff mechanism, which we simply call sqlite3
. We do this by issuing this git config
command:
git config diff.sqlite3.textconv "sh -c 'sqlite3 \$0 .dump'"
This will create an entry in our repository’s .gitconfig
file. If we want, we can also set this configuration for all repositories, like this:
git config --global diff.sqlite3.textconv "sh -c 'sqlite3 \$0 .dump'"
This will instruct git diff
to pipe its files through the sqlite3
command and its .dump
operation, which simply dumps its entire database as a big SQL statement. The output if this operation is then used as input to the normal git diff
mechanism.
If we look into our .git/config
file we should now see something like these lines in there:
[diff "sqlite3"]
textconv = sh -c 'sqlite3 $0 .dump'
Those two lines were added when we ran the git config
command above.
Next, we need to tell git that we want to use this diff mechanism for sqlite3 files. There is no prescribed extension for sqlite3 files, but .sqlite3
is a common one, so let’s use that. We should add the following line to our .gitattributes
file for our repository:
*.sqlite3 diff=sqlite3
This tells the git
command that it should use our newly defined sqlite3
diff mechanism for all files that match the *.sqlite3
pattern.
And that’s it! Let’s look at an example.
An Entire Example
We start by creating a new directory and an empty git repository:
mkdir new-directory
cd new-directory
git init
And we configure it with the mechanism from above:
git config diff.sqlite3.textconv "sh -c 'sqlite3 \$0 .dump'"
echo '*.sqlite3 diff=sqlite3' > .gitattributes
Next we create an sqlite3 database. We do this using the sqlite3
command line tool and its interactive mode:
sqlite3 database.sqlite3
We are now in interactive mode so let’s create a small database:
CREATE TABLE users (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
email TEXT NOT NULL
);
And, while still in interactive mode, we add some users into our database:
INSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com');
INSERT INTO users (name, email) VALUES ('Bob', 'bob@example.com');
We can now use the SELECT
statement to see that they are there:
SELECT * FROM users;
Which should give us the output:
1|Alice|alice@example.com
2|Bob|bob@example.com
We exit the interactive sqlite3 mode with:
.exit
Now we can commit our new database to our git repository:
git add database.sqlite3
git commit -m "A new database"
Now we have a git repository with a binary file that contains our small database. Let’s add another user and see what happens when we run git diff
on our repo.
We enter interactive mode again:
sqlite3 database.sqlite3
And run:
INSERT INTO users (name, email) VALUES ('Charlie', 'charlie@example.com');
We then exit interactive mode:
.exit
Now let’s see if we can do a git diff:
$ git diff
diff --git a/database.sqlite3 b/database.sqlite3
index f56ddeb..6b1cfe5 100644
--- a/database.sqlite3
+++ b/database.sqlite3
@@ -7,4 +7,5 @@ CREATE TABLE users (
);
INSERT INTO users VALUES(1,'Alice','alice@example.com');
INSERT INTO users VALUES(2,'Bob','bob@example.com');
+INSERT INTO users VALUES(3,'Charlie','charlie@example.com');
COMMIT;
Oh look! We see that there is a new user entered into our database – just what we would like to see.
If we commit this change to the git repo, we can even use git log -p
to see exactly what changed between revisions. Note that we can’t use git add -p
to add our changes: git will complain about how only a binary file changed. But this is probably for the best, as we wouldn’t want to commit partial diffs in a situation like this anyway.
Conclusions
sqlite3 is a powerful and handy database that people use for a lot of different things. While committing sqlite3 files to a git repo may not always be the recommended way to work, it is something that happens from time to time. The trick presented here is a neat way to deal with the situation, in case it happens.