Monthly Archives: May 2005

Semantic Web: Heated Debate

Another provactive post on the value of the Semantic Web compared to the semantic web has spurred some more debate at Danny Ayer’s blog.

What’s clear from the discussion is that there’s still fairly serious disagreement about what the thrust of the Semantic Web is, as well as the practical limitations of its implementation via RDF/XML.

What I’d like to understand: what are the practical limitations of using MicroFormats? What do you lose by taking an approach that is based on MicroFormats instead of RDF? And if GRDDL provides a data migration path from MF to RDF, do those limitations really carry any weight?

Semantic Web: A Critique

I just ran across this gem, which somewhat mercilessly shreds the idea of the Semantic Web as hopelessly at odds with the complexities of the world it is trying to represent. It concludes with this bit:

Much of the proposed value of the Semantic Web is coming, but it is not coming because of the Semantic Web. The amount of meta-data we generate is increasing dramatically, and it is being exposed for consumption by machines as well as, or instead of, people. But it is being designed a bit at a time, out of self-interest and without regard for global ontology. It is also being adopted piecemeal, and it will bring with it with all the incompatibilities and complexities that implies. There are significant disadvantages to this process relative to the shining vision of the Semantic Web, but the big advantage of this bottom-up design and adoption is that it is actually working now.

Are MicroFormats the kind of “piecemeal” effort described above? I think so.

Structure on the Web: A Survey

Okay, having made a long-winded setup in previous posts, I want to delve into the real substance of the matter. If you accept the idea that adding structure (read: semantics) to content on the web will open up grand new possibilities, making content more accessible and useful, the question is: what’s the best approach? What follows is a brief survey of the various alternatives currently under development and discussion.
Continue reading

Structure and Semantics: The Semantic Web

Long before the explosion of the blogosphere, some very smart people were thinking about how to make information on the web more useful. In 2000, most people were finding information on the web by using one of the then-popular search engines, like Yahoo or Excite. But as the web had grown, the experience of using these systems had deteriorated. It wasn’t that these systems couldn’t find pages somehow relevant to what you were looking for; to the contrary, the problem was that there was too much information on just about any topic you could conceive. We were beginning to hit a fundamental limit on the utility of pure text retrieval systems given the size of the corpus. And that corpus was huge.
Continue reading

Distribution of Content Authorship, Part II

In the previous post, I wrote about the trend toward broadening the distribution of content authorship, and how the emergence of blogs caused an explosive acceleration of that trend. In general, putting authorship into the hands of individual authors is important and valuable — if content is authored “at the source” you have accomplished a kind of disintermediation with all its attendant rewards: reduced expense, more timely dissemination, etc. But it’s not all rosy.

Imagine the world of a Billion Blogs, each one posted to with varying frequency. Authors are dilligently linking to each other’s posts in relevant ways. Readers are leaving comments in various ways. Trackbacks are happening. Now, if you’re looking for some particular piece of information in this space, what do you do? Google (at least in its current form) isn’t an appropriate resource.

Instead, you probably turn to a service like Technorati or Feedster. These services have solved at least one part of the problem where Google would currently fail: by implementing a “ping” interface, they are able to know the instant that a posting has been made on any one of those billion blogs. The URL of your new post immediately goes into a queue to be processed and then included in an index. There’s obviously a question of scale here, but clearly this is a better way to index rapidly changing information than the old approach of just sending a spider around on a random walk through a web of links to find new content.

But another problem remains. View the entire collection of blog posts as an ever-expanding ball. Older posts are at the core, newer posts are on the surface. In general, links point inward, toward the core. If you use a system like Google’s PageRank algorithm, you’ll tend to favor posts at the core. Which is not really what you want: at any given moment, the most interesting action is at the surface. But, while you can include those surface posts in an index the moment they appear, you can’t really apply the same kind of PageRank metric to them, because no else has linked to them yet. You can apply a proxy for this missing linkage: assume that blogs that achieved good linkage in the past are more relevant, useful, or trustworthy now. But in the world of a Billion Blogs, that approximation will probably really fall down: you won’t reliably be able to assume that someone who once made a popular post will continue to issue popular posts. Indeed, you may already find that searching blog postings on Technorati or Feedster is less satisfying than searching the good ol’ web on Google.

So what’s the solution? How do we get back to Google-level performance in finding relevant information in the world of a Billion Blogs?

That’s for the next post. But I’ll give you a hint: it’s about semantics and structure.

Distribution of Content Authorship, Part I

What the web promised 10 years ago is now finally being delivered by blogs.

A decade ago, the emergence of the web as a media platform offered the possibility of turning the established publishing model on its head: since anyone could put up a website, the theory went, the playing field would be leveled and anyone who had something to say could set up their virtual soapbox and say it with a voice equal to that of the traditional players.

But it didn’t work out that way. While a number of new brands successfully established themselves as media outlets on the web (e.g., C|Net, Slate, etc.), the biggest voices on the web turned out to be the same ones that we’d already been familiar with before the web: The New York Times, The San Francisco Chronicle, The Mercury news, etc.

What happened to the early promise of the web? Where were the thousands of soap boxes? Why didn’t they materialize on the web as we knew it in, say, 2002?
Continue reading

Blog Launch!

If a blog is syndicated, but no one subscribes to it, does it make any sound?

Stay tuned, hypothetical reader, as we explore philosophical questions of great import such as this.