mod_virgule and UTF-8 weirdness

I’m seeing more and more UTF-8 related issues pop up in code lately for some reason. Much of the debugging work I’ve done with the ODP XML dumps has been tracking down illegal XML characters and invalid UTF-8 byte sequences.

Now I’ve run across a related bug in mod_virgule. The trust metrics on robots.net stopped working a few days ago and today I took some time to track down the reason. It turned out to be an interesting little issue with the way mod_virgule handles the storage of data in the XML database. I’ve implemented a temprorary work-around that has things working safely again but I think a longer term fix is needed.

I posted to the virgule_dev mailing list about the problem but it’s been pretty much dead for the past few months. Basically what happened is a foreign user posted some data to their user profile using a funky non-UTF-8 compatible character set. The result was a corrupt profile.xml file for that user account. That, in turn, led to Apache segfaulting during each subsequent attempt by mod_virgule to process the trust metric. Because of the segfault there was no error reporting to alert anyone of the problem and it took several days before anyone noticed that something was wrong.

The root of the problem seems to be that mod_virgule is simply taking whatever raw data a user puts in a form and passes it directly to xmlSetProp(). This works great as long you only give it valid UTF-8 data but it’s not designed to work on anything else. It seems to me that four things need to be done to fix this:

  • Pages need to explicitly specify UTF-8 as the doctype
  • All form data needs to be validated before passing to libxml
  • Invalid data needs to be converted or rejected
  • The trust metric code needs some additional error handling

If anyone has any thoughts on this or has had a similar experience with mod_virgule, I’d be curious to hear about it.

Buying a Mini-ITX

After playing with the VIA Technologies Mini-ITX board for the robots.net review, I decided to buy one and put together a system for NCC to try out as a Linux server. I immediately ran into a problem – it’s hard to find anyplace to buy one. We normally buy our hardware through Tech Data, a large national distributor or through ASI, a somewhat smaller distributor that specializes in hardware produced in Asian countries. Neither carries VIA Technologies products. I tried a number of local hardware distributors without luck and finally ended up with the local Frys store as the only option. Frys is fine for some things but it’s not a place I like to buy anything mission critical like a server motherboard. But I convinced myself it couldn’t be that much of a risk and picked up a Mini-ITX M10000 board at the Irving Frys Wednesday morning.

Immediately upon opening the box, I realized I was in trouble. The motherboard was not in an anti-static bag and didn’t have the pink anti-static mat under it like the demo we reviewed, it was just lying in the bottom of the cardboard box. It was missing assorted jumpers and a few other parts. And the bottom of the board had several discolored areas of the type caused by severe overheating. I was pretty sure it was toast but connected it to a power supply and monitor to make sure. Yep, it was dead.

Upon further examination, I noticed a couple of square white stickers on the outside of the box that looked like some sort of Frys quality control info. They had handwritten dates and several paragraphs of fine print about manufacturers warranties and such. A couple of lines into the first paragraph of the second sticker was the phrase “this product may have been returned”. Yikes. Someone had bought this board, toasted it, returned it to Frys, and they’d put it back out on the shelf with the new products.

Back at Frys, I attempted to return the board and get an actual new, unopened, unreturned, untoasted one. It took a little work to get them to take it back. At first they said I couldn’t return it because I hadn’t brought back the anti-static bag and the CD (I hadn’t even noticed the missing CD until now). I pointed out that it was also missing some jumpers and was completely dead. The Frys’ return clerk decided to check another box from the shelf and see what was in it. Interestingly, the box he pulled was missing the CD, the ATX back plate, and the cables. Turns out he did exactly what I did. He grabbed a box from the shelf thinking it was new but it had the well-hidden “this is a defective return product” blurb on it. This convinced him to give me a refund.

I checked the shelf but all four of the remaining M10000 boxes were returns. Yesterday I drove out to the really big Frys in Arlington and they had about ten VIA Mini-ITX boards. I found a total of four M10000 that weren’t customer returns and bought one of them. I looked around at some of the other motherboards and it looks like it’s SOP for Frys to mix defective customer returns in with new products on the same self. Most reputable stores have a special section where they offer customer returns at a discount.

Anyway, the new board was just like the demo we reviewed; well packed, anti-static bag, all the parts were there, and it fired right up the first time and ran beautifully. In the end, I guess there are two morals to the story. 1) Be careful when buying motherboards at Frys and 2) VIA Technologies needs to work on getting their product into normal distribution channels like Tech Data and ASI.

Mini-ITX Boards

I’ve been catching up on my ToDo list the last couple of weeks. Most of it was boring work-related stuff. But among the fun things, I finally got the review of VIA Technologies Mini-ITX board posted on robots.net. The board will now pass on to the DPRG where it will hopefully end up in a robot or be put to some other equally creative use. I liked it so much, I think we may buy a couple of Mini-ITX boxes to try out as Linux servers at NCC. If nothing else, they make a lot less noise than our new Dell 1750 does.

I also managed to put a few more fixes and patches into my fork of the mod_virgule code and released a new version today. The libxml2 patch has been in place since the last release and has working pretty well. I added some minor cosmetic fixes today to make the XML output nicely formatted again. I also incorporated a patch from James Henstridge that add RSS link elements for diaries.

To get away from the computer for a while this afternoon, we went to an exhibit of Martha E. Simkins paintings at the Irving Arts Center.