BSA, ODP, RDF, and other TLAs

We received another BSA threat-letter at NCC Friday. That’s two in as many weeks. It was the usual collection of vague threats that if we didn’t rush out and buy some Microsoft, Adobe, and Macromedia software the BSA might have to search our office for unlicensed software and fine us a few million dollars. This time I called their toll free number and told them to remove us from their mailing list because we were tired of getting lame marketing letters disguised as legal threats. (feel free to call them yourself and let them know what you think of them – hey, it’s a free call! 1-877-536-4BSA) I also told them we’d instituted a company-wide policy to discontinue the use of all software products made by BSA members in favor of Free/Open equivalents because of the marketing-by-extortion methods of the BSA. The girl I was talking with claimed they removed our address from their list and, after specifically asking her twice to do so, claimed she had made a note in our file about our new policy. So, will they really remove us from their list or will they put us on the list of companies to target with audits? Time will tell.

ODP finally solved the problem with RDF generation and a new RDF dump showed up on the 13th. The downside is that the dump is still riddled with invalid UTF-8 sequences and illegal XML characters. On the last RDF, I provided offsets of a lot of the errors by waiting for an XML parser to bomb-off and then looking for the problems with a hex editor (which is time consuming when you have to start over on a 1GB file after each error). This time I decided to be lazy and wrote some quick C code to do it for me. Strangely, a search on the net had failed to locate any UTF-8 or XML checkers that would work on arbitrarily large files. And most XML validators don’t check for illegal XML characters or invalid UTF-8 sequences, they simple fail unrecoverably when the they hit one. Anyway, I processed the RDF files and posted a list of errors in the latest dump. So with a little luck, the next RDF dumps will be much cleaner.

Medical Mysteries

“You have the sort of sinuses an ENT dreams about” – not what you want to hear from your doctor after a CT scan. About four months ago I had a cold. A month later I still had it and began to suspect it wasn’t a cold. My ENT thought it might be an allergy. This was a depressing thought as I’ve never had allergies before and didn’t look forward to being unable to breathe through my nose for the rest of my life. He put me on Zyrtec and Allegra. Another month later and things were worse, not better. The doc ordered a CT Scan. That part was actually pretty cool; lying face-down on a platform that moved linearly in and out of a rotating ring within a huge, pivoting mass while red targeting lasers lit up my head.

The result of the CT scan was a series of about 35 cross sectional views of my head. It was a coronal view, so it only covered the front third so of the head. But it was enough to show that all my sinuses, which should contain nothing but air, were solid masses. At least one was begining to calcify. So the fix is to perform, lets see, a total ethmoidectomy, a bilateral frontal resection, a bilateral anterior resection, and a bilateral sphenoidotomy. Three hours or more of nasal roto-rootering.

The immediate next step is trying to get the insurance company to approve the procedures.

When I get time, Maybe I can scan the CT images and try to assemble them into a nice 3D view of my head.

ODP News

RDF export is still broken. Overall performance is still degrading rapidly – a lot of folks haven’t been able to access the public site at all lately. Since the ODP software isn’t designed to scale beyond one physical server and there’s nothing faster than the 16 CPU Sun 4500 handy, they decided to give priority to the editors by throttling back public access. So we can edit to our hearts content but you can’t see the site or use the RDF exports. The one developer who maintains the entire dmoz/ODP system and codebase is doing all she can but AOL won’t allow outside help because it’s proprietary code. So we’re keeping our fingers crossed that she’ll get things fixed but I have to admit that I’m not too optimistic at present. Not a happy situation.

There’s been a lot of talk about starting a new directory project to fill in the ODP gap with something new that’s not controlled by AOL and that’s not just Open Content but Open Source/Free Software as well. I’ve gotten email from several interested editors who want to work on the project and have been talking with two other programmers who’ve been working on a clone of the ODP software for a while. If anyone else is interested, or has some ideas feel free to contact me.

The Dangers of Blogging

Someone I know just lost her job for saying the wrong thing about the wrong person in her weblog – yikes!

First Post: 2003

Well, I suppose it’s a bit late to be posting new year’s resolutions so I’ll just skip straight on to other things. I’ve picked up a couple of new ODP categories to play with. ODP still hasn’t gotten the RDF export fixed. They seem to be having some major scaling issues right now. Wish they’d accept some help from the many editors who’ve offered but they seem determined not to.

The DPRG had to find a new meeting place this year. For several years we’ve been meeting at the Bill Priest Institute. Starting this month we’ve been offered meeting space at The Science Place, where we usually hold the Roborama and other competitions. So expect to see more robots and hackers wandering around The Science Place this year.

XTM, RDF, DAML, OIL, and other uses of XML

My recent discussions with the ODP guys about open-sourcing the ODP backend software have led me to read up on RDF, which is the format used by ODP for exporting the ontological information and content of the directory. One thing I immediately ran into was XTM, the ISO standard for creating XML Topic Maps. These seem to me to be competing standards in that they both use XML to describe ontological information. RDF seems to be enjoying much more widespread use on the web but I’m playing catch-up in this particular area right now, so I may be missing some uses of XTM. One helpful document I’ve found is a paper by Lars Marius Garshol comparing XTM, RDF, and two RDF extensions, DAML and OIL. If anyone knows of other introductory-level documents describing the similarities and differences of XTM and RDF, I’d be curious to hear about them.

Turkeys and Free Software

I hope everyone had a nice Turkey day. Ours went fairly well. We ate a lot of food, visited with friends and relatives, and ate more tasty food. I also went for four entire days without doing any work. I did stop by my office once but only to check email. Speaking of email, I notice spam levels continue to rise. Our mail server blocked 32,614 spams during November – and I still ended up with 30 or so per day making it to my mailbox. I’d say the majority of the SMTP traffic on our network is now spam.

Well, it turns out that even though DMOZ/ODP provides “open content” they do it using closed software. So no luck with my offer to help them debug the RDF export problem. They had plenty of other offers to help from editors who were also hackers but all the offers of help in the world won’t do any good if they keep their source code secret. I posted links to some Free Software/Open Source propaganda in the hopes it might change a mind or two but I’m not going to hold my breath.

ODP Data Export Problems

I became aware of an interesting DMOZ/ODP problem today. Apparently the RDF export code is a bit broken lately and there hasn’t been a successful RDF export of the ODP content since September 22. This means the secondary sites such as Google that rely on ODP data aren’t able to see any of the changes we’ve made in the last month. I tracked down an editor’s forum with a discussion about the problem and volunteered to help out. I’m not even sure where one can download the code or what language it’s written in yet but we’ll see what happens.