Jarrod Trainque

7Nov

Page-based blogging, tag-based taxonomies, and the next generation of online publishing tools

Ever since writing this post, I’ve been thinking a lot about the taxonomy of blogs and how content is organized on personal sites.

First, I think most web-based blogging applications (Wordpress, Movabletype, Drupal, Textpattern, etc.) do an insufficient job at organizing and categorizing content for a couple of reasons that I’ll explain below.

Sorting by time

First, it’s important to understand the nature of blogs. Traditional blogging applications primarily sort content based on date and time, with the newest content appearing at the top of the home page, the next-most recent content below the most recent article, and so on. We can think of this as sorting content through the dimension of time.

Time-based sorting of content makes sense for sites reporting timely or news-type content. Unfortunately, most bloggers don’t post content frequently or consistently enough to justify time as the primary method of sorting content. As a result, it’s not uncommon to come across a homepage consisting of posts with large time gaps. The problem is not that there are gaps, but that these gaps are not always apparent to visitors, since posts are presented in sequence regardless of the delay between posts.

Most individual entry pages (or permanent links for a single entry) also allow for browsing through time, usually by offering a “previous post/next post” option. The downside here is that sequential posts often have little to do with one another, and so it’s not particularly useful for visitors to navigate through posts in sequence. Just like on a home page, individual page “next/previous” navigation schemes don’t make the time gaps apparent. Users clicking on the “next post” like have no clear way of knowing if they are jumping ahead weeks or hours by when they navigate ahead one post.

sorting by categories

Categories provide another dimension for sorting content. Mosts blog posts indicate what category they belong to or are “filed to”, which allows visitors to navigate through content based on subject area or topic.

The problem with traditional hierarchical category models is that visitors browsing by category often get “stuck” in a particular category, with no way to break out into another related category.

Categories also fail when it comes to flexibility, due to their strict hierarchical taxonomy. Most bloggers start out with just a few categories, but as the quantity of content grows and their interests change, they find they need to create more categories.

While creating new categories is generally an easy process, there are some scalability problems that arise. A common problem is the need to separate out a category into more specific subject areas once extensive content is developed. For example, a blogger might have chosen “web design” as a category early on, but then as they posted more and more, found that they needed to break it apart into constituent categories such as “CSS”, “HTML”, “accessibility”, “usability” and so on. The process of going back and reassigning old posts into the new sub-categories is messy and undesirable. Unfortunately, the only true way to address this is to have some foresight into the possible categories you might use before you start developing content. This is a difficult task, akin to trying to see into the future.

Hierarchical categories also don’t offer any solution for the age-old KM problem of content having more than one logical location. Consider the following taxonomy:

  • books
    • drama
    • suspence
    • romance
    • humor
    • reference
    • science
    • math

Image that this is the way you sort, for example, your book collection. Each end-node on the taxo correlates to a physical shelf in your home library.

This taxonomy works fine, until you purchase a humor book about math. You’re left with a few options.

  1. You could move “humor” to permanent subordinate location under “math”.
  2. You could move “math” to a permanent subordinate location under “humor”
  3. You could make “humor” a new category subordinate to “math”, and keep the existing “humor” category for “pure” humor books
  4. You could make “math” a new category subordinate to “humor”, and keep the existing “math” category for “pure” math books
  5. You could create a new shelf called “math humor” and keep both existing “math” and “humor” categories for more pure math and pure humor books

Option 1 & 2 don’t make sense, since not all humor books are math-related, and not all math books are humor-related. They need to exist as mutually exclusive shelves.

Option 3 & 4 don’t make sense either, for a few reasons. First, when it comes time to locate your book you’d need to know that it was both “math” and “humor” related. Otherwise you end up looking in the wrong spot. Secondly, you end up with the same category duplicated in two places, once as a 1st level category and again as a subordinate category. This isn’t very scalable. Lastly, you’d have to choose between either option 3 or option 4, even though both are equally logical.

Option 5 doesn’t make much sense in terms of scalability, since as you add to your collection you’d need to create a new “shelf” for each book unlike another book in your collection. You’d likely end up with as many shelves as you have books, in which case it’s simpler to forego category based organization altogether.

Ideally, the best solution is to figure out a way to file this “math humor” book onto both shelves. Strict hierarchical taxonomies make this difficult, much as it would be difficult to physically file a book away in two places. In the end, we end up using our best judgement and stick the book somewhere, leaving us with at best a 50% chance we’ll find it when we need it. (If you’re smart, you’d make a note on the unchosen shelf pointing to the chosen shelve. This is the workaround to filing something in two places).

tagging as a replacement for categories

Tagging, or the assigning of keywords to a particular post, has begun to really catch on with many sites allowing you to sort all kinds of content from photos to bookmarks to to-do items via a tagging process. Tagging as a means for classifying content has strong potential to replace more hierarchical categories, because it solves some of the problems mentioned above. .

First, tagging allows content creators to sort and file away content as it’s created, on the fly, based on the content itself. No need for a predefined picklist or dropdown… the allowable categories are as diverse as the content creator’s interests. Someone utilizing tag-based categorization of content need not worry about how each post fits into a predefined hierarchical categorical taxonomy, since the taxonomy develops as the content develops. No more need to be able to see in to the future in order to organize content.

Tagging allows for multiple tags to be assigned to a particular piece of content which nicely solves the dual-shelf problem mentioned above. In essense, you could easily file away the “math humor” book into the category “math”, the category “humor”, and the category created by the intersection of math and humor. No matter where you looked, you’d find the book you wanted.

It should be noted that while tagging is typically used to organize content by subject matter or topic, tagging also allows you to file away content via type, emotional descriptors, instructions to self, date, or special codes. For example, you could tag a piece of content as “essay”, “satire”, “funny”, “to read”, “xfglks” or “July 2nd, 2005″ just as easily as you could tag it based on the subject matter. While this might be viewed by some as advantageous, I don’t this is desirable with respect to blogs. Here I am specifically speaking of a tag-based framework a possible replacement for categories, not as a replacement for every possible method of slicing content.

considering the publisher and the visitor behavior

A major weakness in current blogging platforms is that they aren’t necessarily user-centric. For example, consider the vast multitude of blogs out there that aren’t the kottke’s of the blogosphere. Most blogs don’t get regular front-page visitors. Instead, most traffic arrives via google, and takes visitors to an individual entry buried deep within the site that somehow relates to the user’s interest.

Once the user is done reading the individual entry, and perhaps leaving a comment, most blogging applications leave the user with a few options:

  1. Go to home page
  2. go to the next/previous post
  3. stay in the same category and browse
  4. visit the master archives page
  5. click on a calendar for a specific date

If we assume that the home page is a mixture of topics scattered across days or week (e.g it’s a mess), there’s no real incentive for the visitor to stick around, unless your blog is very narrow in terms of subject matter.

The next/previous links are probably not very useful either. Why should someone care about what you said next as a means for navigation? Again, this might be more useful if your blog stuck to a narrow topic, but otherwise you’re subjecting your visitor to one post about your cat followed by your rant about Intelligent Design, with no clear connection.

Browsing by category has value. After all, the visitor arrived at your site from querying google for a “topic”. The kicker here is that if your category taxonomy is broad and shallow, visitors will locate more content but some of it won’t be as relevant as they might like. On the other hand, a deep complex taxonomy organizes your content better and keeps content relevant, but visitors can’t easily jump to related categories. Still, a better way to address categories (subject matter) is through a tagging system.

The master archives page, while used widely, is rather pointless. How often do you fire up a list of hundreds of posts in order to locate something? Worse yet, master archive pages are more likely to be sorted by date instead of topic, further making it difficult to locate the content you are looking for.

The calendar is the most useless of all. Why would someone choose “Tuesday, September 5th” as the primary incentive to click a link? Calendar functionality is nice for content creators to keep track of when they published stuff, and to see their publishing patterns, but doesn’t make sense for the end user.

The next generation blogging app

While I don’t think traditional blogging tools will go away, I expect we’ll soon begin to see some next generation web publishing tools crop up that function more like personal content management systems than time-based journals.

These next generation publishing tools will be unlike blogging applications in the sense that they will focus on the individual page more than the home page or the special archive pages. They will provide visitors with plenty of relevant navigation options from each individual page, and allow visitors to traverse a site based on a number of relevant dimensions.

In terms of structure, I propose the 2 types of pages:

  1. Individual pages
  • Each individual page should have the normal post, post title, comments, etc. that one expects on any blog individual post page.
  • it should have links indicating what keyword(s) the post is tagged with. These are specifically topic-based tags.
  • it should have related tags that the post is not tagged with, but that the currently used tags share with other posts.
  • it should have links to other related articles that are also tagged with the same or related tags as the current post.
  • it should have a separate tagging system for non-subject-based categorization (e.g. “rant”, “funny”, etc.)
  • it would have a separate tagging system for date, along with subdued “next/previous” arrows. Note that this would allow you to tag posts with more than just the published date.
  • search box

2. The Home/Archive/SERP Page

  • A separate home which would also function as a time-based archive page, topic archive page, non-subject-based categorization page, and search engine results page. Whenever someone clicked on any link (except next/previous, and related posts links) on the individual page, you’d be taken to a custom home page with aggregated results. Visitors arriving at the home page via direct navigation would be presented with either a list of categories or a list of recent posts, depending on the preferences of the author.

In addition, I also propose that this page-blogging tool (plog?) be extremely lightweight and tiny:

  1. One file to generate the individual entry page
  2. one page to generate the multi-functional home page/serp page/archives page
  3. a CSS for styling
  4. a mysql database backend

Oh, and there’s no reason it shouldn’t employ AJAX, right? Why bother with a backend publishing console when you can point, click, and edit right on each page.

Summary

In short, I propose that blogging as a means of online publishing should change to better suit user behavior for both content consumers and publishers.

If any skilled programmers out there wants to work with me to make this next-generation online publishing tool a reality, drop me a line.

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a comment, or trackback from your own site.

4 Comments

  1. Comment by erik — November 7, 2005 @ 11:53 am

    I mostly agree with your sentiments, though many, if not most of the features that you state are available in Drupal.

    “Free Tagging” is going to be built into the upcoming 4.7. Basically a tag is the same as a taxonomy with all rights and priveledges as such. Multiple categories, multiple taxonomies and heirarchies are all possible. Editing an entry is done at the page level, though there are batch functions as well.

    Tagging is one of the areas that AJAX style interactivity has been subtly put to use. When typing a tag it will actively search your taxonomy and present you with possible matches.

    With the release of 4.7 there is also a breakaway from column layouts, where ‘blocks’ can be displayed not just in the sidebar but in other areas as well.

    At 447kb zipped or 1.4mb installed it’s got a tiny footprint, though it doesn’t meet your goal of just a handful of files.

    What I admire most about Drupal is that it’s not only striving to be more easy to use but more flexible as well. Instead of ‘create blog post’ or ‘create event’ it’s heading more in the direction where the admin decides what fields are necessary and can still hook them into the rest of the system, like web Legos.

    I agree with you that calendar based archives are next to useless. I think the only person they would work for is the content creator, where their memory would fill in the necessary gaps to recall just what happened during a specific month and year.

  2. Comment by Andrew — November 15, 2005 @ 10:19 pm

    Most blogs also have the capacity to file something under multiple categories.

    Unless I’m missing a crucial point here I don’t see why one can’t simply use categories as tags [and rename them to such if it makes them feel any better]

  3. Comment by Jarrod — November 16, 2005 @ 5:11 pm

    You could, but freetagging has the advantage of being far more flexible. You can create tags on the fly,whereas categories often need to be created separately.

  4. Comment by Marco — November 23, 2005 @ 8:56 pm

    Well, the ideal, next generation tool is already, or will be soon, available. Unlike Wordpress and Drupal, it also has a superb interface and makes great use of Ajax. Care to see it?

    http://www.mad4milk.net/entries/mooflex

    The screenshots do not do it justice, you have to see it live!

RSS feed for comments on this post.

Leave a comment