Cryptic Arcana

With great power comes great unreadability, at least in the case of Apache’s mod_rewrite. I recently remarked on my reorganisation of SoylentRed’s archives but the same old archives are still there. What gives? I made some significant changes on my local test server to replace the old style URLs like http://www.soylentred.net/archives/?month=jul-2004 with the more useful http://www.soylentred.net/archives/2004/07/ (remember: at the time of writing that URL won’t work). In fact the new system has five types of archive page. There’s the complete archive, http://www.soylentred.net/archives/, which displays the titles for every post ever; the yearly archive, http://www.soylentred.net/archives/2004/, which displays the titles for every post in the given year; the monthly archive, http://www.soylentred.net/archives/2004/07/, which displays complete entries for the whole month; the daily archives, http://www.soylentred.net/archives/2004/07/06/, which displays all entries for that day; and the individual archives, http://www.soylentred.net/archives/entries/187, which is the new style for permalinks.

Uh, yeah, permalinks. Won’t they be messed up? That’s the beauty. It’s also the reason I’ve called this post ‘cryptic arcana’. I’m talking about mod_rewrite, the Apache module that rewrites requests on the fly according to a series of regular expressions. You see there are no directories on the server called /2004/ or /entries/. That wouldn’t scale. Instead I have a set of rules:

# Rewrite archive URLs:
RewriteRule ^archive/([0-9]{4})/([0-9]{1,2})/([0-9]{1,2})/?$ archive/?year=$1&month=$2&day=$3
RewriteRule ^archive/([0-9]{4})/([0-9]{1,2})/?$ archive/?year=$1&month=$2
RewriteRule ^archive/([0-9]{4})/?$ archive/?year=$1
RewriteRule ^archive/entries/([0-9]+)/?$ archive/entry.php?id=$1

For example the first rule looks for any request of the form archive/ followed by four digits, a slash, one or two digits, a slash, one or two digits and possibly an optional trailing slash. It then rewrites it in the form archive/?year=, then the four-digit block, then &month= then the first one- or two-digit block, then &day=, then the second one- or two-digit block. The advantage of this second form of the URL—which incidentally would work if you typed it directly rather than using the pretty form—is that it directs every archive request to a single script so there’s no need for seperate directories for every year, month and day since the site launched.

Okay, now to get back to the permalink issue. The old archive links point to archive/?month=MMMYYYY, where MMM is the three-letter abbreviation of the month name and YYYY is the four-digit year. You may notice that this is exactly the same script as the new URLs ultimately point to. Then its a simple matter of PHP mojo sorting the ‘month=MMMYYYY’-style requests from the ‘year=YYYY&month=MM&day=DD’-style requests. The old permalinks point to archive/?id=N where N is the numerical ID of the entry. Again this is the same script that deals with all the other requests. In this case it simply looks for the ‘id=’ query string and, if it’s found, redirects to the new URL archive/entries/N. This URL, when the browser asks for it, will then be subject to the same mod_rewrite rules above. In this case the fourth rule takes effect and passes the ID to a script called entry.php.

And the reason none of this is actually deployed yet? Well that’s because I left out a line that’s necessary on the Linux server that SoylentRed runs on but isn’t necessary on my own Windows machine. The line is Options +FollowSymlinks and I have no clue what it has to do with mod_rewrite. Now that I know it’s probably just a matter of uploading. But if there’s anything I’ve learned from the last year and a half of SoylentRed it’s never to upload anything until I’m prepared to fix whatever goes wrong this time.

CategoriesUncategorized