You are here

Automatic pull of content: some issues

Making Topic Pages Work
· ·

It seems so simple. You've got press releases that are clearly tagged to neighborhood (let's say the two possible neighborhoods are Capitol Hill and Atlas District). The Atlas District page should obviously only have Atlas District news, so you create a a section on the Atlas District page that lists the most recent three press releases there. Your web developer whips something like this up quickly (examples from the excellent local blog Frozen Tropics):

Automatic feed can look so simple
Sample (idealized) news feed

Possible Issues

Seems easy enough, right? Sometimes the straightforward approach may be fine (especially for small sites), but you could wind up with something more like this if you're not careful:

Lots of problems can arise in automatic pulls of content if you're not careful.
Sample (problem) news feed

Here are some of the potential issues with larger sites:
Drafts and embargoed material

"this should not appear anywhere, in any channel, until published"

Let's say you're about to post a press release containing the menu for a new restaurant in the Atlas District, and you've agreed to post it after 7pm tonight. You'll be working on a draft beforehand so that it's ready to go at 7:00. Obviously, the press release shouldn't appear until after approved time. This is more significant an issue than it appears, since if you start exposing APIs and other means of sharing your content, the same rules should apply there (rather than developers recreating the rules, and potentially introducing errors, every time).

Editorial decisions

"yeah, but I don't want it on my page"

A press release is published that is related to both the Atlas District as well as Capitol Hill. Perhaps it's about a bicycle race that will result in street closings in Capitol Hill but only parking in the Atlas District. The owner of the Atlas District page doesn't think it's significant enough to appear on the Atlas District page. This would be a case where the tagging to Atlas District is correct, but there is a valid editorial decision to not include it on the Atlas District page (perhaps there's another separate event there that should be in the top three). In this case, the press release should not be retagged to remove Atlas District, since for some purposes (such as enterprise search) you will want the correct tag.

Bad Tagging

"this tag is just wrong"

This one is virtually impossible to avoid when dealing with a large group of people submitting content. Let's say that a new person who does not know DC very well arrives, and mistakenly tags something to Capitol Hill instead of the Atlas District (perhaps mixing up 401 H St NE and 401 H St SE). Note that this is very different than the editorial decision issue, although at first blush they seem similar. In this case, the tagging is wrong and should be corrected (or, in the case of automated tagging, the rules should be changed).

Multilingual Issues

"don't show me partial results in another language"

A variety of issues can occur when pulling content in many languages, especially when, as is usually the case, different pieces of content are in different languages. You can end up with too little new content (if you are displaying a page with too little content in that language), or with unnecessary duplicate content (see Interleaving Languages).

Broadcasted content

"I need this important information on all pages of the site"

If you have a lot of publishers and content, you may sometimes have content that should appear in all pages (broadcasts), regardless of what neighborhood the news is about (let's say a press release about Washington, DC overall and not specific to a neighborhood). What you *don't* want to do (but may indeed do in a crisis if this wasn't planned for) is tag content to all neighborhoods, for example, to have content appear there although it is not correct to tag it so.

Appearance of Timeliness

"a year old press release isn't 'current news'"

If you end up with a lot of automated pages (for instance if you cover 30 different neighborhoods), then it's easy to wind up with the block that says "Current News" that has very old content. In addition, if you are displaying events then events that are far in the future could overwhelm an event happening tomorrow.

What to do about it?

Some high-level pointers:

  • Clearly articulate how you want your automatic pulls should work, as early in your process as possible.
  • Don't think of each block in isolation, but try to implement things in a consistent manner (for instance, by only having page blocks behave in a few different ways)
  • Similarly, consider whether developers should have control over all aspects of each block, or whether much of the aggregation should only be available through a consistent API
  • Be mindful of the issues above when designing your page/block behavior and training of those that will be tagging.

Making Topic Pages Work

First published 31 August 2009