Archiving the CABE website

On 20 October 2010, the government announced that funding for CABE had been withdrawn as part of the Comprehensive Spending Review. Put simply, CABE would cease to exist from from 1 April 2011.

So the CABE website faced an interesting challenge. What happens to the digital assets of a quango when it shuts? There were very few precedents.

The CABE site is a large, popular site that is cherished by its users:

5,000+ pages of content
3,000+ unique visits per day (ABCe-audited in October 2010)
200,000+ publication downloads each year
A large international audience (38% of visits are from outside the UK)
Extremely positive user feedback.

This is the story of how I archived the CABE website. It’s not the greatest story ever told, but it might be useful if you’re going through the same thing.

This post is the first of 3 about archiving the CABE website - read more about meeting the content challenge and overcoming the functional drawbacks.

The National Archives’ archive

Continuing to run the CABE site beyond 1 April 2011 was never an option. The required resources would simply not be available.

My first plan was to convert the website into 5,000 PDFs and host these using some dirt cheap hosting. But luckily, the National Archives had already been archiving our site since 2004. They agreed to work with us to create a final version for our permanent archive.

Their archiving engine is incredible (it uses the same technology as the infamous Wayback Machine). Here’s what it captures:

Every single page
Every image and diagram
All linked files, such as PDF, Word, Powerpoint
All the CSS files and Javascripts (essential for the design).

So by using the National Archives, all of CABE’s web content would continue to be available for the forseeable future. Cue sigh of relief.

Functional drawbacks

But it was not all good news. Digging around uncovered two serious drawbacks:

Internal search breaks
The golden rule for archiving websites is that server-side processing does not work, making search an impossible dream. This is a big deal for us, as search behaviour is critical to a 5,000 page information-based site.
External links stop working
Links to external pages don’t actually link to external pages in the archives, they point to archived versions of those pages. So thousands of links on our site would break, and nothing erodes trust faster than a host of broken links.

The other drawbacks were mostly offset by the closure of CABE. Commenting, user surveys and RSS feeds are all redundant if you can’t update the site. Publication ordering is no longer necessary when you can’t fulfill orders. You don’t need to collect email subscriptions if you stop sending email newsletters. And no one needs to log in any more.

Read about how I dealt with the functional drawbacks in more detail.

Content challenges

Getting 5,000 pages of content ready for permanent archiving in less that 6 weeks was the biggest user experience challenge that I’ve ever faced.

There are huge but subtle differences between the website of a living organisation and that of a dead one. Out of date information needs removing, calls to action need deleting and all ‘about us’ sections must be rewritten.

It was fantastically difficult, but several things made it manageable:

Avoiding a tense rewrite
We decided not to rewrite everything in the past tense, as we felt that the archive banner provided users with enough context to make this unnecessary. So we only did this for key areas like the homepage.
Content strategy
When you need to quickly update your site, nothing beats being up-to-date in the first place.That we were was entirely down to my existing content strategy.
Management buy-in and staff contribution
CABE’s senior management mobilised staff, who responded to the challenge by dropping everything to review and update the web content. Incredible.
Tools and workflows
Nothing special, but very effective – Excel, Basecamp, Word, Dreamweaver, email, Drupal, PureText and Firefox with some add-ons.

Read about how I dealt with these content challenges in more detail.

Final Thoughts

As I write this, the final version of the site is being spidered by the archive engines. We’ll have the archived versions back in a few weeks, at which point we’ll redirect all our existing URLs at the archived versions (if all goes well).

Running the CABE site for the last 3 years has been an incredible journey. I thought that I knew about user experience design when I started, but nothing prepared me for how much I would learn by talking day-in, day-out to both users and stakeholders about what they wanted (and needed) from a website.

Now it’s time for the next challenge…

This post is the first of 3 about archiving the CABE website. I’ve written in more detail about the meeting the content challenge and overcoming the functional drawbacks. I’d love to know what you think - let me know on @myddelton.