intro |  html |  css |  design |  scripting |  dhtml |  server 


Caching dynamic content with Apache's Mod-Rewrite

This page covers:
      Example Site Overview
      Caching with mod-rewrite
      Complete VirtualHost directive
      Apache Documentation


Dynamic content is easy to generate, but unless your content changes very frequently (and even if it does), caching it is better and faster for you and your viewers. This is an extremely quick and dirty explanation of how to create flat html from your dynamic content using Apache's mod-rewrite.

Please note that while all the configuration text and php code below uses "", this code is not actually used on this site, but IS in use on another site I run. These notes are really rough, and will be augmented in the future with more complete documentation and code examples.

This is not a particularly well thought out example of how to set this up - it's just one example of a working implementation. In this case I decided to add the content caching as an afterthought and simply added it on top of what I'd already built. You may want to put more thought into your implementation.

Site Overview

All the content on our example site is database driven (click here to see the sql file for this database). The database has tables for user authentication (to add / edit content), site sections, and content. This particular site allows any and all registered users to add content to the site. Whenever content is added or edited in a section, all the html files in that particular section are deleted because each content page contains links to the other content in that section - this isn't an issue at all with this particular site because content gets updated infrequently, but you may want to architect your site differently.

Caching with mod-rewrite

Apache's mod-rewrite module is one of my favorite and most frequently used modules. Caching is just one useful example of mod-rewrite. The HTML files are created only when someone clicks on a link to a page that doesn't already exist. All the links on every page in the site are .html links - all content is created by php pages written to flat files. The way that mod-rewrite works in this case is that if an html file does not exist, the url is rewritten so that my .php page that generates the html files creates the file and displays the content for the user to see. Because it is not a redirect, the user never sees a ".php" url.

httpd.conf: mod-rewrite directive

Three lines of code in my httpd.conf file is all I need to cache content (I've left the commented RewriteLog config directive in there in case you've never used mod-rewrite - you may want to turn it on to see how it works if you're having problems):

    RewriteEngine On
    #RewriteLog "/home/www/"
    #RewriteLogLevel 5
    RewriteCond /www/{SCRIPT_FILENAME} !-s
    RewriteRule ^\/(.*)\/(.*)\.html /gc.php?u=/$1/$2 [L]

What it does:

RewriteEngine On
Turns on rewriting.

RewriteCond /www/{SCRIPT_FILENAME} !-s
The condition pattern is an extended regular expression. The {SCRIPT_FILENAME} is a SERVER_VARIABLE. The ! means NOT, the -s flag checks to see if the file exists and has size > 0. So the above rewrite condition basically says if the file doesn't exist, then rewrite with the rule below.

RewriteRule ^\/(.*)\/(.*)\.html /gc.php?u=/$1/$2 [L]
The $1 and $2 are backreferences to the two groups in the first part of the regular expression (what gets matched in the parentheses). The [L] flag means it should be the last rule, and not to rewrite if there are other rules (the order of RewriteRules, if you have multiple rules, is important). This rule gets the directory name and the html file name (without the .html extension) and rewrites it to pass these two vars to my php file that generates the content.

I've named all my html files "id.html" where id is the content id in the database. The directory name is also the section name so my php page (gc.php) gets the content by querying the database by section and content id.

For example, if the file, "", does not exist, it gets rewritten as "" and my gc.php then writes out the html file and displays its contents.

RewriteLog "/home/www/"
RewriteLogLevel 5
RewriteLog is the log file to write to.
The RewriteLogLevel sets the verbosity level of the rewrite log: 0 means no logging, 9 or more means almost everything's logged.

The Complete VirtualHost directive

This is the complete VirtualHost directive including the Rewrite directives from my httpd.conf file. Do not copy this in its entirety - it is here just as example. You obviously have to modify the directive for your particular website and server.

    DocumentRoot /www/
    ServerAlias *
    RewriteEngine On
    #RewriteLog "/www/"
    #RewriteLogLevel 5
    RewriteCond /www/{SCRIPT_FILENAME} !-s
    RewriteRule ^\/(.*)\/(.*)\.html /gc.php?u=/$1/$2 [L]
    ErrorLog /www/
    CustomLog /www/ combined

Apache's Mod-Rewrite documentation

The above is a quick example and explanation of mod-rewrite. I highly recommend you visit the Apache site for complete information on mod-rewrite. It's a powerful and useful tool. Know it; love it. And have fun! © 1999-2011. all rights reserved. // site created and maintained by kathy ahn