Technical drawbacks in the CABE archive

The National Archives’ archive of the CABE website reproduces all our pages, images, CSS and Javascript. But it fails in two areas that are critical to the user experience – internal search doesn’t work and links to external sites break.

This post details the workarounds for these (and other) issues.

This is a follow-up to my earlier post about archiving the CABE website.

Internal search

Internal search requires server-side processing. The web server takes your search term, passes it to the search engine, waits for the results and then displays these on your site. But there is no server-side processing in the National Archives, so internal search breaks.

Now, I’m proud of the clarity of our site structure, but nothing beats search for finding known items – especially in a 5,000 page website.

So I replaced the search box with a link to a Google Custom Search. It’s an extra click but if you really want to find the thing you’re looking for, you can. There may be a better solution - embed an iframe with the Google box straight into your site header. But I had no time to test it.

External links

There are two ways to link to external sites from an archived website.

The first option is to treat external links normally. This way, clicking onPlanning Policy Statement 3: Housing takes you to the current version of the policy. But websites are always changing. The policy may be superseded or the page itself might be deleted, and both actions break the user experience horribly.

The second way is to take a snapshot of the linked page and link to that. That way, clicking on Planning Policy Statement 3: Housing takes you the version that was live when the archive was created. This is even worse – you don’t want to be telling users to meet the requirements of discontinued policies!

So we removed all external links from the CABE website. Thousands of them. It was painful, but you can simplify the process:

  • Make it easy by highlighting external links with a user stylesheet
  • Do the same in your WYSIWYG (Drupal’s FCK editor has its own stylesheet)
  • Split your content inventory between willing helpers (incentivise!)
  • Save time by opening batches of 30 links in Firefox with LaunchClipboard
  • Use Xenu Link Sleuth to mop up the links you miss.

Where we really needed to leave an external link in place we left the bare, unlinked URL on the page for the user to copy and paste. Needs must.

I’ll write more about Xenu Link Sleuth soon. But if you run a site, you need it.

301 redirects

It’s not enough to just archive the content. Users need to be able to find it. Google sends us 60% of our traffic and 20% comes from other sites, so 80% of our traffic relies on existing URLs.

We decided to keep cabe.org.uk for 2 years so that we can use 301 redirects to map these existing URLs to their archived equivalents. This means that Google will “understand” that the site has moved (and update its results) and that links from other sites will keep working until 2013.

We also used redirects for a few critical pages. cabe.org.uk/redirects/archive links to information about archiving the CABE site which will change over time. Using a 301 redirect on our root domain lets us repoint this link to a new URL whenever we want to. (The same applies to contact information).

Videos

In the time available, it was impossible to tell whether videos would work in the National Archives. As a fallback we included this above every video:

If you cannot see the video(s) below go to
cabe.org.uk/redirects/videos

Ugly, but effective.

Kill your darlings

Finally, our developers pitched in with some housekeeping to strip out any functionality that could trip up the user experience:

  • Replaced global and local search boxes
  • Disabled the registration system
  • Turned off all commenting features
  • Stripped out our email signup forms
  • Redacted the Google maps
  • Retired our Autogenerated-PDF plugin
  • Ditched our lovely social media buttons
  • Switched off the publication ordering system
  • Parked the RSS feeds
  • Made final changes to the header and footer

The final version

Sad though it was to see these features go, they were just enhancements to the experience of using the CABE site. The core function of the site was always to publish our technical advice for professionals to use in their work

That function remains intact.

In archiving the site I owe two massive debts of gratitude. First to the staff at the National Archives who have been knowledgeable, helpful and supportive through what was a manic time. Secondly, to Precedent Communications, in particular Andrew Travers and Adam Elleston.

Get in touch with me on @myddelton if you have any questions or comments. This is a follow-up to my earlier post about archiving the CABE website.

Content challenges in the CABE archive

The biggest challenge in archiving the CABE website was reviewing 5,000 pages in 6 weeks. This included removing out of date information, deleting calls to action and making it read like an archive rather than a going concern.

The biggest decision turned out to be the simplest. Grammar. We decided not to rewrite the whole site in the past tense as the whole thing is framed by a banner stating that it’s a permanent archive. This freed up time to actually review content rather then just going crazy with a red pen.

This is a follow-up to my earlier post about archiving the CABE website.

Content strategy will save you

We’ve had a strong content strategy at CABE since launching our redeveloped site in early 2009. Of course, no one was thinking about archiving the site when we created this. Yet several elements turned out to be really helpful:

  1. Content inventory and URL design
    Our rolling content inventory includes data from both Drupal (supply-side) and Google Analytics (demand-side). The friendly URLs meant our inventory could be easily sorted. So we knew exactly what we were working with. Essential.
  2. Content types
    Our 10 different content types each have individual content templates. So with minimal work we knew that massive sections of the site (e.g.over a thousand design reviews) didn’t need to be touched at all. Huge win.
  3. Style guide
    Our style guide insists on “timeless” content as we don’t have the resources to revisit content often. This means that much of the remaining content didn’t need any more work to be archive-ready. Another huge win.
  4. Editorial calendar
    For all the content that needs regular updating we have an editorial calendar. Most content had been regularly updated since relaunch in 2009, so very few sections needed a major overhaul.
  5. Page owners
    Every section (e.g. housing) has a named owner. So people were ready to review content, rather than having to learn new stuff at the 11th hour.
  6. Publishing workflow
    All content is reviewed by our communications team before upload. Yes, this creates a bottleneck, but it also ensures content quality.
  7. Accessibility and web standards
    We use simple, well-structured HTML for most of our content and very little AJAX or complicated functionality. So 99% of our content worked in the archive with no technical issues (major exceptions being the videos).

The lesson? Taking a systematic approach to your content strategy will pay off in ways that you cannot imagine. No wonder it was the hot topic of 2010.

Getting management buy-in

If your managers don’t support an all-staff process then it won’t happen.

The week that I pitched my process to CABE management, I also went to the London UX Bookclub. We were reading Dan Roam’s The Back of the Napkin, which is about using sketches to communicate business ideas.

CABE is a design organisation and, unsurprisingly, it’s full of visual thinkers. I learned this the hard way early on by producing big reports that didn’t get read.

So I used my pitch to sketch a Gantt chart showing what needed to happen.

It was a gamble, but it worked. We had a great discussion about the key issues – was it necessary to archive at all, were the dates set in stone, who needed to do what. They asked tough questions, but we agreed an archive process that everyone was happy with.

The archive process

We agreed the archive process on 22 November 2010 and the National Archives’ final deadline was 4 January 2011. Which meant 6 weeks to:

  • Audit all the content by 26 November
  • Agree actions with 26 teams by 3 December
  • Receive all final content amends by 10 December
  • Revise homepage and top level sections by 17 December
  • Incorporate feedback from senior management by 24 December
  • Make all final content changes by the deadline of 23:59 on 3 January

Although this meant working through Christmas, it also meant no interruptions.

Tools of the trade

The content strategy made it manageable, the process made it tangible and all that was left was to make it doable. This is all about tools and workflows.

Excel is Microsoft’s killer application – even Jon Gruber says so – as it gives you programmer skills without having programmer chops. The main use here was to manage the global content inventory and easily farm sections of work out to different teams.
 
Basecamp helped me track of more than 150 individual actions from 26 content meetings. Every meeting was recorded in a Basecamp message and every action lived on a to-do list. This kept me sane – even though, as the only person in the project, I was effectively talking to myself!

My big guilty secret is not using the Drupal CMS to manage web content. Instead we use everyday tools like Microsoft Word, shared drives and email – after all, if it works for all other content, why break it? This also keeps content management separate from website technology and lets technophobic staff edit pages directly.

The secret is to start with well-structured HTML. Copy text from a webpage into Word, edit it in Word, paste-special it into Dreamweaver and it magically retains its original HTML structure (paragraphs, headings, lists). Normal staff work in Word while web editors convert their output to HTML in seconds.

And, of course, PureText. If you don’t use it, you should do. For everything.

I’m going to write about harnessing the brutal power of Excel, hacking together effective workflows and lamenting the state of CMS UX soon.

Ready to die

By 4 January 2011 we had reviewed every piece of content on the site, added new top level content and the CABE website was ready for archiving. Yes, there will be some errors, but it still feels like a huge achievement.

It would not have been possible without the enthusiasm and commitment of the amazing staff at CABE. I actually feel pretty emotional wrapping this up, so let’s leave it there. You know who you are.

This is a follow-up to my earlier post about archiving the CABE website.Get in touch with me on @myddelton if you have any questions or comments.

Archiving the CABE website

On 20 October 2010, the government announced that funding for CABE had been withdrawn as part of the Comprehensive Spending Review. Put simply, CABE would cease to exist from from 1 April 2011.

So the CABE website faced an interesting challenge. What happens to the digital assets of a quango when it shuts? There were very few precedents.

The CABE site is a large, popular site that is cherished by its users:

  • 5,000+ pages of content
  • 3,000+ unique visits per day (ABCe-audited in October 2010)
  • 200,000+ publication downloads each year
  • A large international audience (38% of visits are from outside the UK)
  • Extremely positive user feedback.

This is the story of how I archived the CABE website. It’s not the greatest story ever told, but it might be useful if you’re going through the same thing.

This post is the first of 3 about archiving the CABE website - read more about meeting the content challenge and overcoming the functional drawbacks.

The National Archives’ archive

Continuing to run the CABE site beyond 1 April 2011 was never an option. The required resources would simply not be available.

My first plan was to convert the website into 5,000 PDFs and host these using some dirt cheap hosting. But luckily, the National Archives had already been archiving our site since 2004. They agreed to work with us to create a final version for our permanent archive.

Their archiving engine is incredible (it uses the same technology as the infamous Wayback Machine). Here’s what it captures:

 

  • Every single page
  • Every image and diagram
  • All linked files, such as PDF, Word, Powerpoint
  • All the CSS files and Javascripts (essential for the design).

So by using the National Archives, all of CABE’s web content would continue to be available for the forseeable future. Cue sigh of relief.

Functional drawbacks  

But it was not all good news. Digging around uncovered two serious drawbacks:

  • Internal search breaks
    The golden rule for archiving websites is that server-side processing does not work, making search an impossible dream. This is a big deal for us, as search behaviour is critical to a 5,000 page information-based site.
  • External links stop working
    Links to external pages don’t actually link to external pages in the archives, they point to archived versions of those pages. So thousands of links on our site would break, and nothing erodes trust faster than a host of broken links.

The other drawbacks were mostly offset by the closure of CABE. Commenting, user surveys and RSS feeds are all redundant if you can’t update the site. Publication ordering is no longer necessary when you can’t fulfill orders. You don’t need to collect email subscriptions if you stop sending email newsletters. And no one needs to log in any more.

Read about how I dealt with the functional drawbacks in more detail.

Content challenges

Getting 5,000 pages of content ready for permanent archiving in less that 6 weeks was the biggest user experience challenge that I’ve ever faced.

There are huge but subtle differences between the website of a living organisation and that of a dead one. Out of date information needs removing, calls to action need deleting and all ‘about us’ sections must be rewritten.

It was fantastically difficult, but several things made it manageable:

  • Avoiding a tense rewrite
    We decided not to rewrite everything in the past tense, as we felt that the archive banner provided users with enough context to make this unnecessary. So we only did this for key areas like the homepage.
  • Content strategy
    When you need to quickly update your site, nothing beats being up-to-date in the first place.That we were was entirely down to my existing content strategy.
  • Management buy-in and staff contribution
    CABE’s senior management mobilised staff, who responded to the challenge by dropping everything to review and update the web content. Incredible.
  • Tools and workflows
    Nothing special, but very effective – Excel, Basecamp, Word, Dreamweaver, email, Drupal, PureText and Firefox with some add-ons.

Read about how I dealt with these content challenges in more detail.

Final Thoughts

As I write this, the final version of the site is being spidered by the archive engines. We’ll have the archived versions back in a few weeks, at which point we’ll redirect all our existing URLs at the archived versions (if all goes well).

Running the CABE site for the last 3 years has been an incredible journey. I thought that I knew about user experience design when I started, but nothing prepared me for how much I would learn by talking day-in, day-out to both users and stakeholders about what they wanted (and needed) from a website.

Now it’s time for the next challenge…

This post is the first of 3 about archiving the CABE website. I’ve written in more detail about the meeting the content challenge and overcoming the functional drawbacks. I’d love to know what you think - let me know on @myddelton.