Posted on Tuesday, 12.13.05
When will all the tagging madness come to an end? I think tagging is noble, but this exercise should not have to last forever. Simply put, tagging is a method by which people categorize unstructured content by associating one or more keywords (as opposed to automated, learning-based apps). Given that, we should view tagging as a way to get the big engines started … sort of like a hand-cranked engine starters, back in the day.
Machine-based categorization engines exist, but to get them going, we need to ‘feed’ them with a reliable set of reference documents, if you will, so they can ‘learn’ what would belong to a given category and what wouldn’t. After that, they could compare any new documents with their reference set, and categorize them accordingly. Of course, I am over simplifying the problem here, but lots of work has gone to making them work reasonably well. Some of the toughest problems have been overcome (taxonomies, self-organizing or self-expanding taxonomies, etc.), but the problem is seeding such engines with ample reference data to teach them right from wrong. Now, I am sure there are enough of these learning-based categorization engines available (most are used within corp intranets or are integrated with other business apps), so couldn’t it be that we would:
- start harvesting the tagged content;
- use the same algorithm used by search functions, say the one used by Technorati, when retrieving and ranking tagged content to determine which tagged content should be used to “learn from” and which should be discarded (if mis-tagged);
- start “teaching” the knowledge-hungry categorization engines by feeding them the already tagged content;
I think hand-tagging multimedia (non-text) content is probably the only feasible/practical solution for the time being, but we have alternatives for text-based content that we should leverage. After all, we all have better things to do than tagging, don’t we?
Tagging of text-based content should be done by machines. We, as volunteer users, should participate in teaching them how to categorize/tag with the expectation that machines take over, as soon as possible. However, as things are getting setup currently, it doesn”t look like anyone has that goal in mind.
I wonder if Google, Yahoo! and Microsoft are working on something along these lines? There are many speculative rationalizations of why Yahoo bought del.ico.us, but one that I haven’t heard of is may be they have seen the value in a service that attracts a free pool of resources that are willing to categorize the content on the web. I think del.ico.us would make an excellent teaching tool.
Thoughts?

Between 1999-2001, I worked for a a company called FizzyLab. One of the products we built was a categorization engine. It was learning-based. Once it had been ‘taught’, then it would use our core enabling technology, Content Relevator, which modeled then indexed entire documents instead of individual keywords or phrases w.in a document, to compare and categorize new documents.
The company is gone, but the technology was licensed by Intelligent Results, a Seattle company.