ODP, hierarchical organization, and other thoughts

I went to a google@work seminar in Dallas last week. It was mostly a sales pitch for Google’s enterprise services, but there were a few interesting bits such as getting a glimpse of Google’s intranet. Another thing stood out that prompted this post. Part of Google’s pitch is that hierarchical organization is dead. More than that, all hierarchical models of organization are bad. Whether it’s directories on your hard disk, folders on your desktop, folders in your email program, categorical tagging of rss feeds, or topical organization of website contents, it’s all bad, bad, bad. The one true way, they claim, is to dump all your data into a single chaotic mess and “embrace the chaos”. By which they mean, of course, purchase Google Enterprise products and services to search for what you need. After all, how else will you ever find what you’re looking for – your data is now lost in the chaotic mess. Asking a company the specializes in searching unorganized data how to organize your data strikes me as being very like asking the barber if you need a haircut. The answer will profit someone but probably not you.

Somewhere, during the powerpoint presentation, was a frame actually titled “Heirarchical organization is dead” and it was illustrated by a full frame image of the Open Directory Project’s index page. The sad thing is not so much that they used this example, but that it was such a powerful example. It generated a fair amount of laughter from the audience as the Google guy talked about how sites like ODP used to think they could manually categorize the Internet. He asked how many of the 100+ people present used (or were even aware of) ODP or similar directories for finding things on the web; no hands were raised. Then he asked how many people used search engines like Google to find things on the web: all hands raised. More laughter.

This is one of two events that recently brought home to me just how dead ODP is. The other was when I tried to log in to my ODP editor account and discovered ODP was down. A little research revealed it had been down for quite a while. Apparently there was a hardware failure back in October of 2006. AOL techs managed to bungle the restore process somehow, resulting in the unrecoverable destruction of large amounts of ODP. Then they discovered they’d forgotten to make backups for the last few years. Oops. Since then, they’ve been slowly reconstructing things. The content itself was salvaged from one of the weekly data dumps but all or most of the editor metadata was lost. Information is scarce as AOL has mostly forgotten about ODP and ODP staff continue to be very secretive about everything that goes on. While a lot of public portions of ODP are back online, a lot of the editor functionality is still down six months later. At least one of the important servers used by the editors is still offline. The really suprising thing is not just that I hadn’t noticed ODP being down but the web as a whole hadn’t noticed. There was a time when ODP being down for weeks would have been front page news on sites like Slashdot. Other than ODP editors and a few obscure SEO blogs, no one noticed it was gone.

While I don’t agree with Google’s conclusion that all heirarchical organization is bad, I think they are right in the case of web directories. It’s simply not a useful or reasonable method of organizing web sites compared to more modern social bookmarking systems like del.icio.us or reddit. It’s an adapt or die world and, sadly, ODP doesn’t seem to be the sort of organization that can adapt to the changes taking place.

I expect ODP will limp along if AOL continues to allow it but I don’t hold out any hope that ODP is ever going to fully return from the dead, I’m still an editor and I will continue to assist them with data integrity checking on the weekly XML data dumps (which have finally resumed again, by the way). However, I’m in the process of working with another editor to migrate the data dump checking process to an ODP server, so it won’t take up my time or energy anymore. I’m also spending far less time on my other ODP-related projects.

Speaking of social information processing, there was an interesting paper published by Kristina Lerman of USC this month on the subject, Social Information Processing in Social News Aggregation (PDF format). The paper looks at the way Digg exploits the power of social information processing to solve the problem of rating aggregated news stories.

Nigritude Ultramarine Update

While it’s unlikely that my Nigritude Ultramarine FAQ is going to reach the number one spot on Google by July 7, it’s been more than worthwhile participating in the Dark Blue SEO contest so far. I’ve been able to document a lot of the search engine activity and have learned some new things. For example, I was aware of many black hat SEO tricks that boost a page in the Google results, but I had no idea there were black hat tricks to directly attack a competitor’s site and push it down in the Google results. My page has been the subject of one cloaked page attack and several fraudulent Google spam reports so far.

I can also tell you that it’s possible to get a completely new site listed in Google within 48 hours and that Google updates their results every 24 hours. The page rank trust metric, on the other hand, may only be updated once a month. My page was first listed on May 9th and still has a page rank of zero. I expect this may change sometime in the next week.

Even without using any devious, black hat tricks, I’ve managed to stay in the top 15 results, out of over 350,000, with nothing but good design, actual content, and a handfull of links (most of them due to the goodwill of a few other folks who enjoyed the page).

I did succumb to the temptation of one highly ranked link yesterday, however. I added a link to my contest page in the June edition of the Robot Competition FAQ, which is in the approved LoPIP and goes from news.answers to the RTFM MIT FAQ repository and eventually ends up on faqs.org. Faqs.org is one of those rare sites like ODP, a site with a Google page rank of 9. I don’t think this is likely to boost my page’s position in the search results much but it should give me a higher page rank, which can’t hurt.

Nigritude Ultramarine Update

Since my last post, the Nigritude Ultramarine FAQ, my entry in the SEO contest has been indexed by Google and is now showing up in the results. Unlike the hundreds of other pages out there, this is one of a handful that are actually intended to be interesting. It’s also one of the few that’s not using every dirty, black hat SEO trick in the book to try to cheat the Google rankings. No parasitic link farms, no referrer spamming, no keyword spamming, no page cloaking or any of those other things evil SEOs do to replace real Google search results with spam. Suprisingly, my page made it into the top 20 results anyway and is now bouncing around between position 9 and 12.

That makes me happy mostly because it means Google must be doing a very good job at resisting the SEO attacks so far if a purely information page is ranking that high. I have tried to optimize the page in the sense that it contains validated XHTML/CSS, minimal graphics, and correctly uses the meta description and keyword tags. And I seem to be getting some links to it from other sites, though nowhere near the thousands of links the SEO people are creating to their sites from link farms. Google showed 32k results this morning. There are only about 200-300 contestants so the rest of those must be sites linking to the contestants. I’ve probably got no more than 20 links to my page at present. (feel free to help me out by adding one, if you’d like!)

One thing that I found surprising is how far some of the SEO experts are willing to go to pump up their pages. Within two hours of my site being picked and shown in the Google results, someone filed a Google spam report against it. After doing some searches on SEO discussion forums, I discovered this is standard operating procedure for some SEOs. They file spam reports against any sites close to or higher than theirs in the Google results in the hope that Google will pull their competitor’s site from the database.

Nigritude Ultramarine

A recent slashdot article brought to my attention the DarkBlue SEO Challenge, a contest with the goal of getting a webpage to the number one position in Google’s results for the search phrase “nigritude ultramarine”. I decided to take a whack at it despite the unfortunate choice of words (most people I’ve mentioned it to seem to think nigritude has a vaguely racial-slur sort of sound – it actually means the “state of being the color black”.

The first step was to grab a domain, so I grabbed nigritudeultramarines.com from GoDaddy. The name was picked up by the root name servers last night and now I’m on the way. Most of the other pages I looked at were jokes or meaningless tangles of links connected to parasitic link farms. The link farms seem to be a typical trick used by SEO “experts” to attack Google’s Page Rank trust metric. In much the same way as if a user created hundreds of fake mod_virgule accounts at Advogato or robots.net and tried to certify their main account. Google’s page rank is somewhat resistant to this type of attack but if enough trusted sites link to the the attacker’s link farm, the attack can be successful. A favorite ploy is for the attacker to add domains to the link farms that are recently expired domains with ODP listings. ODP links have a very high trust value (as high as 9 or 10) and just a few such domains can boost the page rank of a parasitic link farm tremendously.

ODP has been putting a lot of effort into combating this and other SEO attacks. Google also expends a lot of effort tweaking their page ranking algorithms to untangle the mess SEO experts make of things. Thinking about this gave me the idea of making a purely information site built in the traditional Internet style to see how it would compare in the Google rankings to a typical “expert” optimized page. Would it be overwhelmed by link farm boosted pages? Would the Internet community favour links to it over the contentless pages? Would anybody even care? ;-)

To this end, I’ve created the Nigritude Ultramarine FAQ. It contains, you guessed it, frequently asked questions about the whole nigritude ultramarine thing. If you have a question about the subject, serious or not, feel free to stop by and ask it. And if you’d like to help me out, I wouldn’t mind a few more links from other sites. Just a simple contextual link with the text nigritude ultramarine – nothing tricky please; no contentless link farms, link spamming, referrer spamming, or the like.

The site has been up for a full 24 hours now and is already getting a fairly steady stream of visitors. The referers seem to be personal blogs, so someone noticed and began spreading the word before me. I have submitted the URL to Google but haven’t been visited by GoogleBot yet. I did get an unsolicited visit from the Ask Jeeves/Teoma spider within hours of going live and was also hit by the QuePasa.com robot today (not sure how either discovered the site).

So far I have aquired inbound links from several PR7 sites, so hopefully I’ll start out with a reasonably high placement in the search results. But on the other hand, some of the SEO folks out there have pages with 4,000+ inbound links from their link farms so this may be a futile exercise. Time will tell.

Update: As I was writing this entry, Googlebot hit the site. We’ll see if I make it into the results by tomorrow. I’ll make some webalizer stats of the traffic available if anyone is interested.