Category Archives: General

General posts

Geeking out!

Having wrapped up my previous gig as VP of Engineering at UniversityNow earlier this month, I’ve been spending my free time over the past couple of weeks geeking out a bit, something I haven’t done with any real intensity in ages.

First up has been moving this here blog to EC2. I made heavy use of this writeup – extremely helpful.

I’ve been toying with moving my mail server to EC2 using node.js-backed Haraka. I’ve got the configuration working as I’d like, but I’m still negotiating with Amazon over the parts that they need to do to encourage other mail hosts (I’m looking at you, Yahoo mail) to believe that my server is a legitimate source of email and not a spambot. It’s unclear whether this will be successful, but it’s been educational nonetheless.

I also got myself a new laptop and have been happily hacking away on that too. I dumped my personal vim configuration in favor of the Janus distribution and am still working my way back to proficiency with the new key bindings: “Thumb-tied and twisted, just a vim-bound misfit, I”

I’m also planning to set up a bootcamp/VirtualBox instance so I can run the occasional Windows program, like IE.

Nathalie is puzzled at what all the fuss is about, but I can begin to feel the power flowing again, which is nice. Better, stronger, faster.


It’s alive!

Spurred on by Bob’s having blog-tagged me, I decided to dust off this WordPress installation and take it for a spin. That involved upgrading to the latest version of WordPress and, in the spirit of the New Year, grabbing a new theme, Ocadia. What do you think? The little swirly icon in the theme makes me think of both Tolkien and the Artist FKaP. It’s probably some unforgivably offensive curse in a druid language, and now I’m going to have a bunch of angry trolls swinging axes around.

The internets can be a dangerous place.

Apropos of nothing, I leave you with this important lesson in history. Warning: this is not work safe.

Anybody home?

Percy Cabello has posted an interview with Mitch Kapor over at Mozilla Links, in which Mitch talks about Foxmarks, a project that he and I have been collaborating on. There’s a nod in that interview to this blog, which as you can see has been fairly dormant for months. So I thought I’d take this opportunity to post back in here, mostly so it doesn’t look so cob-webbed.

And maybe I’ll start posting rants here again, much like I did back in the day (i.e., this summer). But I wouldn’t hold my breath. Not that you would, of course. Just a figure of speech. If you want to read more about Foxmarks, we’re blogging it here.

Semantic Web: A Critique

I just ran across this gem, which somewhat mercilessly shreds the idea of the Semantic Web as hopelessly at odds with the complexities of the world it is trying to represent. It concludes with this bit:

Much of the proposed value of the Semantic Web is coming, but it is not coming because of the Semantic Web. The amount of meta-data we generate is increasing dramatically, and it is being exposed for consumption by machines as well as, or instead of, people. But it is being designed a bit at a time, out of self-interest and without regard for global ontology. It is also being adopted piecemeal, and it will bring with it with all the incompatibilities and complexities that implies. There are significant disadvantages to this process relative to the shining vision of the Semantic Web, but the big advantage of this bottom-up design and adoption is that it is actually working now.

Are MicroFormats the kind of “piecemeal” effort described above? I think so.

Structure on the Web: A Survey

Okay, having made a long-winded setup in previous posts, I want to delve into the real substance of the matter. If you accept the idea that adding structure (read: semantics) to content on the web will open up grand new possibilities, making content more accessible and useful, the question is: what’s the best approach? What follows is a brief survey of the various alternatives currently under development and discussion.
Continue reading

Structure and Semantics: The Semantic Web

Long before the explosion of the blogosphere, some very smart people were thinking about how to make information on the web more useful. In 2000, most people were finding information on the web by using one of the then-popular search engines, like Yahoo or Excite. But as the web had grown, the experience of using these systems had deteriorated. It wasn’t that these systems couldn’t find pages somehow relevant to what you were looking for; to the contrary, the problem was that there was too much information on just about any topic you could conceive. We were beginning to hit a fundamental limit on the utility of pure text retrieval systems given the size of the corpus. And that corpus was huge.
Continue reading

Distribution of Content Authorship, Part II

In the previous post, I wrote about the trend toward broadening the distribution of content authorship, and how the emergence of blogs caused an explosive acceleration of that trend. In general, putting authorship into the hands of individual authors is important and valuable — if content is authored “at the source” you have accomplished a kind of disintermediation with all its attendant rewards: reduced expense, more timely dissemination, etc. But it’s not all rosy.

Imagine the world of a Billion Blogs, each one posted to with varying frequency. Authors are dilligently linking to each other’s posts in relevant ways. Readers are leaving comments in various ways. Trackbacks are happening. Now, if you’re looking for some particular piece of information in this space, what do you do? Google (at least in its current form) isn’t an appropriate resource.

Instead, you probably turn to a service like Technorati or Feedster. These services have solved at least one part of the problem where Google would currently fail: by implementing a “ping” interface, they are able to know the instant that a posting has been made on any one of those billion blogs. The URL of your new post immediately goes into a queue to be processed and then included in an index. There’s obviously a question of scale here, but clearly this is a better way to index rapidly changing information than the old approach of just sending a spider around on a random walk through a web of links to find new content.

But another problem remains. View the entire collection of blog posts as an ever-expanding ball. Older posts are at the core, newer posts are on the surface. In general, links point inward, toward the core. If you use a system like Google’s PageRank algorithm, you’ll tend to favor posts at the core. Which is not really what you want: at any given moment, the most interesting action is at the surface. But, while you can include those surface posts in an index the moment they appear, you can’t really apply the same kind of PageRank metric to them, because no else has linked to them yet. You can apply a proxy for this missing linkage: assume that blogs that achieved good linkage in the past are more relevant, useful, or trustworthy now. But in the world of a Billion Blogs, that approximation will probably really fall down: you won’t reliably be able to assume that someone who once made a popular post will continue to issue popular posts. Indeed, you may already find that searching blog postings on Technorati or Feedster is less satisfying than searching the good ol’ web on Google.

So what’s the solution? How do we get back to Google-level performance in finding relevant information in the world of a Billion Blogs?

That’s for the next post. But I’ll give you a hint: it’s about semantics and structure.

Distribution of Content Authorship, Part I

What the web promised 10 years ago is now finally being delivered by blogs.

A decade ago, the emergence of the web as a media platform offered the possibility of turning the established publishing model on its head: since anyone could put up a website, the theory went, the playing field would be leveled and anyone who had something to say could set up their virtual soapbox and say it with a voice equal to that of the traditional players.

But it didn’t work out that way. While a number of new brands successfully established themselves as media outlets on the web (e.g., C|Net, Slate, etc.), the biggest voices on the web turned out to be the same ones that we’d already been familiar with before the web: The New York Times, The San Francisco Chronicle, The Mercury news, etc.

What happened to the early promise of the web? Where were the thousands of soap boxes? Why didn’t they materialize on the web as we knew it in, say, 2002?
Continue reading

Blog Launch!

If a blog is syndicated, but no one subscribes to it, does it make any sound?

Stay tuned, hypothetical reader, as we explore philosophical questions of great import such as this.