Forgetfulness prompts urgent site rework
I received an email a few days ago from a helpful reader informing me that he’d tried to post a comment but failed. Initially I was worried that Perl was “mis-firing again(magpiebrain - Comments b0rked, comments fixed?)”:http://www.magpiebrain.com/archives/2005/03/21/borkedcomments, until I noticed the URL he was trying to post to. A while ago I “de-cruftified(De-crufty PHP Movable Type goodness)”:http://www.magpiebrain.com/archives/2004/04/26/decruft this site’s URLs. By default Movable Type generates simplistic URL’s named using the entry ID (e.g. @000334.html@). Not only is this not terribly forward thinking (what if I moved to PHP?) the URLs don’t really match the information’s structure.
What I did back then was first to remove all URL suffixes. Next, I moved the URL’s into a format based on months and dates - for example @/2005/04/13@ would be used by all posts made in 2005 on the 13th of April. This left me with nice URLs, but I neglected to handle the fact that I had over 200 posts using the old form. These were still generating hits, but were an old version of the page. When I changed the comment posting these pages started mis-firing.
The obvious solution was to set-up Apache redirection rules for all the posts. To make life easier, I added an easily grep-able HTML comment to all the individual post pages. Next, I ran a recusive find over the archive structure, grepping for the comment:
find . -type file -not -name "*.html" -print | xargs grep "ENTRY "
The @-not@ negated what followed, excluding all files ending in @.html@. This wasn’t completely necessary but the HTML comment I put in wasn’t specific enough so I was getting a few false positives from the old HTML pages. After this query, I was left me with several hundred results like this:
./2005/04/13/this_is_a_post: ENTRY 000342
I fired up “SubEhtaEdit(The OSX collaborative Text Editor)”:http://www.codingmonkeys.de/subethaedit/, and a quick regexp later I had the following for each of the 300 or so matches:
RedirectMatch permanent 000342.html ./2005/04/13/this_is_a_post
This was then pasted into my @.htaccess@ file, and I could safely delete the old HTML files, safe in the knowledge that any Google hits for the old pages would get redirected to the new, de-crufty URLs.
While I was at it, I knocked up a favicon, some rollovers for the sidebar and entry navigation, and enhanced the side navigation to show the last few recent posts and del.icio.us links.
This entry was posted on Sunday, April 3rd, 2005 at 9:26 pm and is filed under General. You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.
Have your say
Fields in bold are required. Email addresses are never published or distributed.
Some HTML code is allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>URIs must be fully qualified (eg: http://www.domainname.com) and all tags must be properly closed.
Line breaks and paragraphs are automatically converted.
Please keep comments relevant. Off-topic, offensive or inappropriate comments may be edited or removed.