Technical drawbacks in the CABE archive

The National Archives’ archive of the CABE website reproduces all our pages, images, CSS and Javascript. But it fails in two areas that are critical to the user experience – internal search doesn’t work and links to external sites break.

This post details the workarounds for these (and other) issues.

This is a follow-up to my earlier post about archiving the CABE website.

Internal search

Internal search requires server-side processing. The web server takes your search term, passes it to the search engine, waits for the results and then displays these on your site. But there is no server-side processing in the National Archives, so internal search breaks.

Now, I’m proud of the clarity of our site structure, but nothing beats search for finding known items – especially in a 5,000 page website.

So I replaced the search box with a link to a Google Custom Search. It’s an extra click but if you really want to find the thing you’re looking for, you can. There may be a better solution - embed an iframe with the Google box straight into your site header. But I had no time to test it.

External links

There are two ways to link to external sites from an archived website.

The first option is to treat external links normally. This way, clicking onPlanning Policy Statement 3: Housing takes you to the current version of the policy. But websites are always changing. The policy may be superseded or the page itself might be deleted, and both actions break the user experience horribly.

The second way is to take a snapshot of the linked page and link to that. That way, clicking on Planning Policy Statement 3: Housing takes you the version that was live when the archive was created. This is even worse – you don’t want to be telling users to meet the requirements of discontinued policies!

So we removed all external links from the CABE website. Thousands of them. It was painful, but you can simplify the process:

  • Make it easy by highlighting external links with a user stylesheet
  • Do the same in your WYSIWYG (Drupal’s FCK editor has its own stylesheet)
  • Split your content inventory between willing helpers (incentivise!)
  • Save time by opening batches of 30 links in Firefox with LaunchClipboard
  • Use Xenu Link Sleuth to mop up the links you miss.

Where we really needed to leave an external link in place we left the bare, unlinked URL on the page for the user to copy and paste. Needs must.

I’ll write more about Xenu Link Sleuth soon. But if you run a site, you need it.

301 redirects

It’s not enough to just archive the content. Users need to be able to find it. Google sends us 60% of our traffic and 20% comes from other sites, so 80% of our traffic relies on existing URLs.

We decided to keep cabe.org.uk for 2 years so that we can use 301 redirects to map these existing URLs to their archived equivalents. This means that Google will “understand” that the site has moved (and update its results) and that links from other sites will keep working until 2013.

We also used redirects for a few critical pages. cabe.org.uk/redirects/archive links to information about archiving the CABE site which will change over time. Using a 301 redirect on our root domain lets us repoint this link to a new URL whenever we want to. (The same applies to contact information).

Videos

In the time available, it was impossible to tell whether videos would work in the National Archives. As a fallback we included this above every video:

If you cannot see the video(s) below go to
cabe.org.uk/redirects/videos

Ugly, but effective.

Kill your darlings

Finally, our developers pitched in with some housekeeping to strip out any functionality that could trip up the user experience:

  • Replaced global and local search boxes
  • Disabled the registration system
  • Turned off all commenting features
  • Stripped out our email signup forms
  • Redacted the Google maps
  • Retired our Autogenerated-PDF plugin
  • Ditched our lovely social media buttons
  • Switched off the publication ordering system
  • Parked the RSS feeds
  • Made final changes to the header and footer

The final version

Sad though it was to see these features go, they were just enhancements to the experience of using the CABE site. The core function of the site was always to publish our technical advice for professionals to use in their work

That function remains intact.

In archiving the site I owe two massive debts of gratitude. First to the staff at the National Archives who have been knowledgeable, helpful and supportive through what was a manic time. Secondly, to Precedent Communications, in particular Andrew Travers and Adam Elleston.

Get in touch with me on @myddelton if you have any questions or comments. This is a follow-up to my earlier post about archiving the CABE website.