ODP, hierarchical organization, and other thoughts

I went to a google@work seminar in Dallas last week. It was mostly a sales pitch for Google’s enterprise services, but there were a few interesting bits such as getting a glimpse of Google’s intranet. Another thing stood out that prompted this post. Part of Google’s pitch is that hierarchical organization is dead. More than that, all hierarchical models of organization are bad. Whether it’s directories on your hard disk, folders on your desktop, folders in your email program, categorical tagging of rss feeds, or topical organization of website contents, it’s all bad, bad, bad. The one true way, they claim, is to dump all your data into a single chaotic mess and “embrace the chaos”. By which they mean, of course, purchase Google Enterprise products and services to search for what you need. After all, how else will you ever find what you’re looking for – your data is now lost in the chaotic mess. Asking a company the specializes in searching unorganized data how to organize your data strikes me as being very like asking the barber if you need a haircut. The answer will profit someone but probably not you.

Somewhere, during the powerpoint presentation, was a frame actually titled “Heirarchical organization is dead” and it was illustrated by a full frame image of the Open Directory Project’s index page. The sad thing is not so much that they used this example, but that it was such a powerful example. It generated a fair amount of laughter from the audience as the Google guy talked about how sites like ODP used to think they could manually categorize the Internet. He asked how many of the 100+ people present used (or were even aware of) ODP or similar directories for finding things on the web; no hands were raised. Then he asked how many people used search engines like Google to find things on the web: all hands raised. More laughter.

This is one of two events that recently brought home to me just how dead ODP is. The other was when I tried to log in to my ODP editor account and discovered ODP was down. A little research revealed it had been down for quite a while. Apparently there was a hardware failure back in October of 2006. AOL techs managed to bungle the restore process somehow, resulting in the unrecoverable destruction of large amounts of ODP. Then they discovered they’d forgotten to make backups for the last few years. Oops. Since then, they’ve been slowly reconstructing things. The content itself was salvaged from one of the weekly data dumps but all or most of the editor metadata was lost. Information is scarce as AOL has mostly forgotten about ODP and ODP staff continue to be very secretive about everything that goes on. While a lot of public portions of ODP are back online, a lot of the editor functionality is still down six months later. At least one of the important servers used by the editors is still offline. The really suprising thing is not just that I hadn’t noticed ODP being down but the web as a whole hadn’t noticed. There was a time when ODP being down for weeks would have been front page news on sites like Slashdot. Other than ODP editors and a few obscure SEO blogs, no one noticed it was gone.

While I don’t agree with Google’s conclusion that all heirarchical organization is bad, I think they are right in the case of web directories. It’s simply not a useful or reasonable method of organizing web sites compared to more modern social bookmarking systems like del.icio.us or reddit. It’s an adapt or die world and, sadly, ODP doesn’t seem to be the sort of organization that can adapt to the changes taking place.

I expect ODP will limp along if AOL continues to allow it but I don’t hold out any hope that ODP is ever going to fully return from the dead, I’m still an editor and I will continue to assist them with data integrity checking on the weekly XML data dumps (which have finally resumed again, by the way). However, I’m in the process of working with another editor to migrate the data dump checking process to an ODP server, so it won’t take up my time or energy anymore. I’m also spending far less time on my other ODP-related projects.

Speaking of social information processing, there was an interesting paper published by Kristina Lerman of USC this month on the subject, Social Information Processing in Social News Aggregation (PDF format). The paper looks at the way Digg exploits the power of social information processing to solve the problem of rating aggregated news stories.

Ray Rainwater RIP

It’s been busy and I’ve fallen behind on posting anything new lately. It’s been a mixed month of good news and bad. The bad news was hearing that Ray Rainwater had died. While not totally unexpected, one never likes to lose a friend. In this case a friend Susan and I knew only through the Internet. Our paths crossed doing genealogical research on the Rainwater family and we’ve corresponded with Ray frequently over the last couple of years. We’d talked about making the trip to Alaska to meet him and he had hoped to make the trip to Dallas at one point as well. Neither happened in time.

He used to send me reminders when I hadn’t updated my weblog in a while (it’s always nice to know somebody actually reads this thing!) and often offered interesting, related anecdotes from his life. When I wrote about my feelings the morning after 9/11, he was reminded of his reaction to the 1941 attack on Pearl Harbor. Whether discussing current events or the specs of the latest digital cameras, there was always something interesting in his emails. He and Susan frequently discussed genealogical mysteries. He will be missed.

There’s been some good news this month as well. Business continues to pick up and there really do seem to be signs of an economic recovery going on. In what spare time I have, we’ve started a major reorganization of the ODP Robotics categories and have already double the size of the category. Meanwhile, after a week or so of downtime, ODP finally installed the new servers for the public side of the site. The new server for the editor site is also up and this week they’re replacing the server that runs the internal forums. The new ODP servers run Linux instead of Solaris and the proprietary forum software has been replaced with GPL’d software. So we’re one tiny step closer in the quest to run the Open Directory Project on Open software.

ODP/dmoz Update

(Sinus update: It’s been about one week since the surgery. I’m off all but a few of the drugs, I’m back to my usual routine at work, and I feel great; better than I’ve felt in a year. I can breathe, taste, and smell. I feel a few years younger.)

The latest RDF dump error report shows no XML character errors for the third week running. Invalid UTF-8 sequences are down from hundreds to just two this week. It’s definitely the best dump ever and I’m keeping my fingers crossed that this week’s dump will be 100% error-free at the character encoding level. In anticipation of that, I’ve started compiling an ODP RDF ToDo list of other bugs and optimizations that need work. I’ve made some progress with one of the oft-requested features for the dump which is to break the full 1GB dump into smaller, category-specific dumps. While testing things out, I’m hosting the smaller dumps locally but if they start seeing a lot of use, hopefully they’ll get moved to an ODP server with enough bandwidth to handle them.

ODP RDF Exports

The RDF exports seem to be coming out like clockwork again from ODP. The first was riddled with errors but the second is much, much better. No illegal XML characters in either file and only one had UTF-8 errors. With luck, the next one will be error free. I’m going to attempt to create smaller RDFs of ODP subcats for those who only need one or two categories and don’t like downloading the full 1GB RDF.

Medical Mysteries

“You have the sort of sinuses an ENT dreams about” – not what you want to hear from your doctor after a CT scan. About four months ago I had a cold. A month later I still had it and began to suspect it wasn’t a cold. My ENT thought it might be an allergy. This was a depressing thought as I’ve never had allergies before and didn’t look forward to being unable to breathe through my nose for the rest of my life. He put me on Zyrtec and Allegra. Another month later and things were worse, not better. The doc ordered a CT Scan. That part was actually pretty cool; lying face-down on a platform that moved linearly in and out of a rotating ring within a huge, pivoting mass while red targeting lasers lit up my head.

The result of the CT scan was a series of about 35 cross sectional views of my head. It was a coronal view, so it only covered the front third so of the head. But it was enough to show that all my sinuses, which should contain nothing but air, were solid masses. At least one was begining to calcify. So the fix is to perform, lets see, a total ethmoidectomy, a bilateral frontal resection, a bilateral anterior resection, and a bilateral sphenoidotomy. Three hours or more of nasal roto-rootering.

The immediate next step is trying to get the insurance company to approve the procedures.

When I get time, Maybe I can scan the CT images and try to assemble them into a nice 3D view of my head.

ODP News

RDF export is still broken. Overall performance is still degrading rapidly – a lot of folks haven’t been able to access the public site at all lately. Since the ODP software isn’t designed to scale beyond one physical server and there’s nothing faster than the 16 CPU Sun 4500 handy, they decided to give priority to the editors by throttling back public access. So we can edit to our hearts content but you can’t see the site or use the RDF exports. The one developer who maintains the entire dmoz/ODP system and codebase is doing all she can but AOL won’t allow outside help because it’s proprietary code. So we’re keeping our fingers crossed that she’ll get things fixed but I have to admit that I’m not too optimistic at present. Not a happy situation.

There’s been a lot of talk about starting a new directory project to fill in the ODP gap with something new that’s not controlled by AOL and that’s not just Open Content but Open Source/Free Software as well. I’ve gotten email from several interested editors who want to work on the project and have been talking with two other programmers who’ve been working on a clone of the ODP software for a while. If anyone else is interested, or has some ideas feel free to contact me.

The Dangers of Blogging

Someone I know just lost her job for saying the wrong thing about the wrong person in her weblog – yikes!

First Post: 2003

Well, I suppose it’s a bit late to be posting new year’s resolutions so I’ll just skip straight on to other things. I’ve picked up a couple of new ODP categories to play with. ODP still hasn’t gotten the RDF export fixed. They seem to be having some major scaling issues right now. Wish they’d accept some help from the many editors who’ve offered but they seem determined not to.

The DPRG had to find a new meeting place this year. For several years we’ve been meeting at the Bill Priest Institute. Starting this month we’ve been offered meeting space at The Science Place, where we usually hold the Roborama and other competitions. So expect to see more robots and hackers wandering around The Science Place this year.