Speed up and clean up Gajim's history

After a few years of using Gajim, the history of all messages, statuses, etc. can grow up to quite large size. In my case the logs.db file consumed 128MB of disk space.

Searching through the conversation history (in the Conversation History window) can take up to few seconds for each query. And after removing and adding various XMPP/Jabber (or Facebook Chat) accounts, there’s a lot of unnecessary Jabber IDs left in the database.

So let’s take a look at how to speed things up and remove unnecessary data form logs.db (but whether it will work for you or not, substantially depends on how you’ve been using Gajim and what you keep in Gajim’s history).

Make a backup first!

First of all: I’m not responsible for any damage you may do to your history file, your computer, house, cat, gold fish, friends, neighbors, etc. And you’re doing it at your own risk ;)

So, close Gajim if it’s still running, and make a backup, please. You should probably be able to find your logs.db file in ~/.local/share/gajim/logs.db. If it’s not there, look for it and change the following commands accordingly.

To make a backup, run:

sqlite3 ~/.local/share/gajim/logs.db .dump > backup.sql

It will create a backup of your logs.db, and write it to backup.sql file (for more specific info, please look at Gajim’s wiki page).

Remove old status messages

If you’re saving status changes in your Gajim’s history, it can grow up quite fast, because there are probably a few additional entries every day for each of your contacts. It’s the Log status changes of contacts option in Preferences. Personally, I like to keep such information, but I see no point to keep very old entries.

Let’s take a look at some numbers:

sqlite> select count(*) from logs where kind in (0,1);
1235830
sqlite> select count(*) from logs where kind not in (0,1);
552679

There are 1235830 entries in my logs.db file containing only status changes. The kind equal to 0 stands for normal statuses, and kind equal to 1 for group chat statuses. On contrary, there are only 552679 entries for other types of messages.

Most of these entries are almost empty (they just log show value of received XMPP stanzas, time, and a few more values), but they still use some space. And now multiply that space by 1235830.

So let’s remove it:

sqlite3 ~/.local/share/gajim/logs.db "delete from logs where kind in (0,1);"

Yep, that’s it. If you now look at the size of logs.db file (for example with ls -l command), you will not see any difference, but we’ll talk about decreasing the size of the file in a moment.

Remove unnecessary contacts

It probably won’t speed up anything (the difference will be probably almost unmeasurable), but if you don’t want to keep unnecessary Jabber IDs in your logs.db, then this is for you. In my case almost all of the unnecessary contacts were from Facebook Chat account that I’d added long time ago and haven’t used since.

The easy (and longer) way

There is a tool called gnome-history-manager that allows you to manage Gajim’s history and remove unwanted parts of the history. If you have only a few such contacts, or remember all Jabber IDs, then this is probably the way to go.

The fun way

I defined unnecessary contacts as contacts that have less than 16 messages (pick any number that works for you) in history. Because if it’s less than 16, then I probably don’t know this person, or it’s not in my roster anymore, or it is but I’m not using the account, or some completely different reason I haven’t thought about.

So let’s first get all contacts that satisfy this assumption:

sqlite3 ~/.local/share/gajim/logs.db \
  "select jids.jid_id,jids.jid,count(logs.message) as cc from jids left join logs on logs.jid_id = jids.jid_id group by jids.jid_id having cc < 16;" > to_remove

This is a very good moment for you to look at to_remove file and search for any contacts you don’t want to remove. Each line consists of contact’s id in database, Jabber ID and number of messages. If you want to keep any account, remove it from this file.

Now extract all id’s from that file:

to_remove_ids=`awk -F"|" '{print $1}' to_remove | tr '\n' ','`
to_remove_ids=${to_remove_ids%?} # remove trailing comma

And you may check if to_remove_ids variable contains something that looks like a correct comma-separated list of ids:

echo ${to_remove_ids}

Now remove all history of these contacts, and the contacts themselves:

sqlite3 ~/.local/share/gajim/logs.db "delete from logs where jid_id in (${to_remove_ids});"
sqlite3 ~/.local/share/gajim/logs.db "delete from jids where jid_id in (${to_remove_ids});"

And now your logs.db should not contain any unwanted entries.

Decrease the size of logs.db

If you remove some data from sqlite3 database, the file doesn’t shrink automatically. The parts that contained the data just stay empty. There is an easy way to decrease the size, though.

Let’s make a backup of our current version of logs.db (after all the deletions):

sqlite3 ~/.local/share/gajim/logs.db .dump > compact.sql

And now restore the backup:

mv ~/.local/share/gajim/logs.db logs.db.orig
sqlite3 ~/.local/share/gajim/logs.db < compact.sql

And there you go. My new logs.db now has 54MB, which is about 42% of the old size (128MB).

Depending on how much data you managed to remove, you may expect more or less significant improvement in execution time of queries accessing Gajim’s history.