A fairytail about Ruby on Rails, Heroku and migrating data to a new database schema

Since I like testing new stuff, and this website serves me very well as a nice small guinea pig for any kind of experimentation (mostly trying out web frameworks and different deployment options), I ended up needing to migrate website’s data to a new database schema. And this time I switched from Play Framework 1.2 to Ruby on Rails 3.2. Of course, I could have modified my models to use the old schema, but that would be less interesting and probably not worth the hassle for a not-critical-so-let’s-have-fun application.

Tools

The tools that are required for this to work are (some of them quite obvious):

Database

I use PostgreSQL for development, because it’s also default on Heroku. But I guess SQLite3 could have been used too - I haven’t tried that though. I needed two databases for my purposes, and the process of creating them was quite straightforward (at least on Arch Linux):

su - postgres
createdb mydatabase-dev
createdb mydatabase-tmp

Here mydatabase-dev represents the database used by the Rails application, and mydatabase-tmp is the temporary database used to pull the old database from Heroku.

Heroku Toolbelt offers few very convenient commands for downloading and uploading the database, which require the taps gem. If you don’t have it, and you want to use it, it’s a good moment to run:

gem install taps

The two commands crucial for this operation to work are: heroku db:pull, which pulls database schema and data from Heroku and heroku db:push, which does the opposite.

These commands use configuration from config/database.yml by default if you run it from Rails’ application folder, but that’s not necessarily what we want. Let’s get the database from our old application.

heroku db:pull --app my-old-app postgres://myuser@localhost/mydatabase-tmp

Now we have our database available locally and we need some clever way to insert the data to the new database (mydatabase-dev). The most convenient solution that I’ve found is to use Ruby on Rails models from the new application. It has also one small advantage - during conversion process our data goes through validations defined in the models, so we shouldn’t be able to insert any obviously harmful data. But of course, before doing that, config/database.yml has to be edited to specify mydatabase-dev as database for development environment, and rake db:migrate has to be executed. Let’s look at the script:

require "pg"

# load whole rails environment
ENV['RAILS_ENV'] = "development"
require "./config/environment.rb"

source_db = PG.connect(dbname: "mydatabase-tmp", host: 'localhost')

source_db.exec("select * from category") do |result|
  result.each do |row|
    puts "adding category: #{row['title']}"
    Category.create(id: row['id'], name: row['name'])
  end
end

# some more tables to migrate
# (...)

source_db.exec("select * from post") do |result|
  result.each do |row|
    puts "adding post: #{row['title']}"
    Post.create(title: row['title'], slug: row['slug'], created_at: row['createdat'], category_id: row['category_id'])
  end
end

source_db.finish

It has been simplified a little bit of course, since knowing all tables and all columns in those tables is not particularly interesting. More interesting is what this script actually does. We start with the obvious require "pg" needed to include classes used to access the database. Then we have:

ENV['RAILS_ENV'] = "development"
require "./config/environment.rb"

and these two lines are responsible for loading Rails’ development environment. We want to use application’s models, don’t we?

Next, source_db = PG.connect(dbname: "mydatabase-tmp", host: 'localhost') doesn’t do any magic. It just creates a new connection to the specified database (equivalent of PG::Connection.new).

The actually interesting part is:

source_db.exec("select * from category") do |result|
  result.each do |row|
    puts "adding category: #{row['title']}"
    Category.create(id: row['id'], name: row['name'])
  end
end

where we take the PG::Result object returned by source_db.exec(...) call, iterate over it, and create categories using Rails’ Category model. Procedure for all other tables is analogical.

Deployment

Deploying new version of the application as the same Heroku app and then pushing the new database would probably result in a quite long downtime, especially in case of some unexpected problems. So the simple solution, is to deploy it as a new Heroku application and push migrated database there. And just switch domain assignments afterwards (remove domain from the old application and add it to the new one).

So after deploying the application, all that’s left is to run:

heroku db:push --app my-new-app postgres://myuser@localhost/mydatabase-dev

…and hope that it doesn’t fail.