Generating Sitemaps in Rails
Code updated 9/18/2009
This week I added sitemap files to my rails site PriceChirp. Sitemaps are used to help search engines to find all your content. They are especially helpful in enumerating pages that are difficult for web crawlers to discover, such as content from database searches.
The web is full of instructions on how to generate sitemaps on the fly using rxml templates. This does not scale well if your site has thousands of links. A better method is to periodically generate site maps and serve these cached files when requested.
www.fortytwo.gr has a good example for generating sitemaps with rails. I've taken his code and fixed/extended it to fit my needs.
Understanding Sitemaps
Basically, there are two types of sitemap files:
- Sitemap files that contain the URL's of your site
- Sitemap index files that contains a list of your sitemap files
Sitemap files
The xml format of the sitemap file is like this:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/</loc> <lastmod>2005-01-01</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> ... ... </urlset>
Where
- loc is the actual url
- lastmod is the last modified date
- changefreq defines how often this url is updated
- priority of this url compared to other urls in your site
Sitemap Index file
The index file contains a list of the sitemap files you want to include.
The xml format of that file is as follows:
<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>http://www.example.com/sitemap1.xml.gz</loc> <lastmod>2004-10-01T18:23:17+00:00</lastmod> </sitemap> <sitemap> <loc>http://www.example.com/sitemap2.xml.gz</loc> <lastmod>2005-01-01</lastmod> </sitemap> </sitemapindex>
Where
- loc is the url of the sitemap file
- lastmod is the last modified date
Building Sitemaps in Rails
To build the sitemap files in rails, we need three things. In /apps/helpers we have a REXML library to build the sitemap index and files. In /lib/tasks, we have a rake script to do the work. Then we make a crontab entry to periodically run the rake script.
Helper Classes
These helper classes generate sitemaps and sitemap indexes using REXML::Document and REXML::Element derived classes. In the file /apps/helper/sitemap.rb:
class SitemapUrl < REXML::Element def initialize(loc, lastmod = nil, changefreq=nil, priority=nil) @loc = loc @lastmod = lastmod @changefreq = changefreq @priority = priority super("url") create_elements end def create_elements #add location el = self.add_element("loc") el.text = @loc if @lastmod el = self.add_element("lastmod") el.text = @lastmod end if @changefreq el = self.add_element("changefreq") el.text = @changefreq end if @priority el = self.add_element("priority") el.text = @priority end end end class Sitemap < REXML::Document attr_accessor :loc,:lastmod, :urls def initialize(loc=nil, lastmod=nil) super @loc = loc @lastmod = lastmod self << REXML::XMLDecl.new("1.0", "UTF-8") urlset = add_element("urlset") urlset.add_attributes('xmlns' => "http://www.sitemaps.org/schemas/sitemap/0.9") @urls = self.root end def to_xml to_s end def add_url(loc, lastmod = nil, changefreq=nil, priority=nil) @urls << SitemapUrl.new(loc, lastmod, changefreq,priority) end end class SitemapIndex < REXML::Document attr_accessor :sitemaps def initialize super self << REXML::XMLDecl.new("1.0", "UTF-8") sitemapindex = add_element("sitemapindex") sitemapindex.add_attributes('xmlns' => "http://www.sitemaps.org/schemas/sitemap/0.9") end def add_sitemap(sitemap) el = self.root.add_element("sitemap") loc = el.add_element("loc") loc.text = sitemap.loc end def to_xml to_s end end
Rake Script to Generate Sitemap
By creating a rake task, we can generate our sitemaps at will. Rake tasks have full access to our models. The file /lib/tasks/sitemaps.rake:
namespace :sitemap do desc "Create Index" task(:create_index => :environment) do puts "Creating Index" items = Sitemap.new("http://pricechirp.com/sitemap_items.xml.gz") statics = Sitemap.new("http://pricechirp.com/sitemap_static.xml.gz") index = SitemapIndex.new index.add_sitemap(items) index.add_sitemap(statics) FileUtils.rm(File.join(RAILS_ROOT, "public/sitemap_index.xml.gz"), :force => true) f =File.new(File.join(RAILS_ROOT, "public/sitemap_index.xml"), 'w') index.write(f,2) f.close system("gzip #{File.join(RAILS_ROOT, 'public/sitemap_index.xml')}") end desc "Create all sitemaps" task(:create_sitemaps => :environment) do #first create the sitemap for Rake::Task["sitemap:items"].invoke Rake::Task["sitemap:static"].invoke Rake::Task["sitemap:create_index"].invoke end desc "Create Items Sitemap" task(:items => :environment) do sitemap = Sitemap.new #add every item user = User.find_by_login('default') for i in Item.find(:all, :select => "id,status_change_at", :conditions => ['user_id = ?', user.id]) sitemap.add_url("http://pricechirp.com/items/#{i.id}",w3c_date(i.status_change_at),nil,'.5') end puts "#{sitemap.urls.length} total urls" #delete the file FileUtils.rm(File.join(RAILS_ROOT, "public/sitemap_items.xml.gz"), :force => true) f =File.new(File.join(RAILS_ROOT, "public/sitemap_items.xml"), 'w') sitemap.write(f,2) f.close system("gzip #{File.join(RAILS_ROOT, 'public/sitemap_items.xml')}") end desc "Create Static Sitemap" task(:static => :environment) do sitemap = Sitemap.new sitemap.add_url("http://pricechirp.com/",w3c_date(Time.now),'daily','1.0') sitemap.add_url("http://pricechirp.com/faq",nil,'monthly','.5') sitemap.add_url("http://pricechirp.com/contact/new",nil,nil,'.5') sitemap.add_url("http://pricechirp.com/items/search",nil,nil,'.5') sitemap.add_url("http://pricechirp.com/signup",nil,nil,'.5') puts "#{sitemap.urls.length} total urls" #delete the file FileUtils.rm(File.join(RAILS_ROOT, "public/sitemap_static.xml.gz"), :force => true) f =File.new(File.join(RAILS_ROOT, "public/sitemap_static.xml"), 'w') sitemap.write(f,2) f.close system("gzip #{File.join(RAILS_ROOT, 'public/sitemap_static.xml')}") end def w3c_date(date) date.utc.strftime("%Y-%m-%dT%H:%M:%S+00:00") end end
The rake script gives us:
rake sitemap:create_index # Create Index rake sitemap:create_sitemaps # Create all sitemaps rake sitemap:items # Create Items Sitemap rake sitemap:static # Create Static Sitemap
Using a Crontab to Automate the Rake Task
Now we can easily create a crontab entry to automatically generate our sitemaps.
5 */2 * * * cd /path/to/your/application/ && /usr/bin/rake sitemap:create_sitemaps RAILS_ENV=production >> /path/to/your/logs/sitemaps.log
Publishing your sitemaps
Now that you have sitemaps, you need to add a line to your robot.txt files to let the search engines know about your sitemap files. You should also submit it to google via their webmastertools to ensure the files are properly formed.
robots.txt:
Sitemap: http://pricechirp.com/sitemap_index.xml.gz