mod_virgule Attack Resistance

lkcl and redi have commented on the ongoing trust metric attack on mod_virgule sites, noting the effects on Advogato. The same thing is happening to other mod_virgule sites including robots.net and ghostscript. I emailed Raph a warning about this activity in May when I first noticed the use of automated programs creating large numbers of identical accounts on the three sites. I don’t want to link to any examples directly but try googling on “dltxprt” or manually typing in the user URL to see an example user on all three of the mentioned sites. I’ve been tracking IPs and the account names on robots.net so I can kill them all off if needed but so far the trust metric has resisted the attack effectively.

The spammer is using the notes field of each account for search engine link spamming but otherwise isn’t causing much immediate harm other than resource abuse. I have working code to delete mod_virgule accounts but I’m still pondering how best to use it to remove the evil doers in this case.

The blog spam seems limited to Advogato for some reason. If it starts on robots.net, I think my solution will be to remove the A tag from the list of tags that can be used by observers. I don’t want to remove the ability of observers to post blog entries, as lkcl suggested, because that’s the only way we find out enough about some new users to decide whether they should receive a higher trust ranking.

One interesting thing to note is that almost all of the spammer’s accounts certify each other, creating what Google refers to as a “bad neighborhood” in webpage trust rank terminology. If you have a legitimate webpage and link to a “bad neighborhood” it can adversely affect your own page’s rank. It might be wise to implement something similar in mod_virgule. If a legitimate, trusted user certifies an untrusted user in a “bad neighborhood”, maybe it should result in decrementing the trust of the legitimate user rather than increasing the trust of the bogus user? Just a thought.

Mod_Virgule Update

I’ve posted another release of my mod_virgule fork this week. Grab the source or take a look at the changelog. This one includes the new and improved configuration handling code. Instead of loading and parsing the entire website configuration on each and every hit, it’s now loaded once per Apache process and stored in thread-private memory that persists across requests. Much more efficient. I also did some more general code cleanup and removed more of the hard coded stuff that makes it hard to use mod_virgule without editing the source code. There are still one or two hard coded things that I need to make configurable. Maybe in the next release. It’s getting close to a completely configurable system that could be compiled, installed, and configured for any site. Anyway, the new code has been running on robots.net for a couple of weeks now and appears stable.

Advogato Weirdness

I’ve noticed that Advogato has been suffering from the same mysterious, random file loss problem that hit robots.net a while back. As best I can tell this is due to a race condition in the “out of disk space” patch. My problems ended once I reverted the patch in my mod_virgule codebase. On the upside, though, maybe this will prompt a line or two of new code for the official mod_virgule in 2005.

Random News Updates

Halloween

We didn’t have many trick-or-treaters stopping by our house to stock up on candy last Halloween, so we’re not planning on much activity this year. And we don’t have any Halloween costume parties in our plans either. You won’t even find a pumpkin with a candle inside at our house this year. But even if Halloween isn’t a busy holiday in Dallas this year, it looks like others are going all out. Take a look these Halloween decorations (and if you think that’s cool, check out his Christmas lights).

Kittens

Our unexpected collection of kittens has reached the age where they’re ready for adoption and we’ve already found homes for a couple of them. Mother cat has now been upgraded to prevent anomalies in the kitten population. We briefly gained a baby squirrel in addition to all the baby cats. It just appeared in the garage with the kittens last Saturday. The Humane Society provided us with the number of a local squirrel and opossum rehabilitator who took the little creature in. They said thunderstorms frequently blow the babies out of their nests and cats sometimes mistake them for kittens and bring them home, which appears to be what happened in our case.

mod_virgule

Time for another mod_virgule release. One of the downsides to the growth of robots.net has been the inability of mod_virgule to handle the rapidly expanding user list. Mod_virgule had very inefficient user list code which had to parse the entire user list, sort it in memory, and then do additional lookups for each user. And it could only display it as a single page. With over 6,000 users, trying to display it brought the whole site to a crawl for 15 to 30 seconds. I’ve now completely recoded the user list functions to be many times faster and to provide the results in a nicely sorted, multi-page format. Care to see the results? Compare Advogato’s user list (which still uses the old mod_virgule) to the new Robots.net user list. As a side-benefit, the new user list pages are completely configurable through an XML file rather than hardcoded in mod_virgule itself.

mod_virgule and UTF-8 weirdness

I’m seeing more and more UTF-8 related issues pop up in code lately for some reason. Much of the debugging work I’ve done with the ODP XML dumps has been tracking down illegal XML characters and invalid UTF-8 byte sequences.

Now I’ve run across a related bug in mod_virgule. The trust metrics on robots.net stopped working a few days ago and today I took some time to track down the reason. It turned out to be an interesting little issue with the way mod_virgule handles the storage of data in the XML database. I’ve implemented a temprorary work-around that has things working safely again but I think a longer term fix is needed.

I posted to the virgule_dev mailing list about the problem but it’s been pretty much dead for the past few months. Basically what happened is a foreign user posted some data to their user profile using a funky non-UTF-8 compatible character set. The result was a corrupt profile.xml file for that user account. That, in turn, led to Apache segfaulting during each subsequent attempt by mod_virgule to process the trust metric. Because of the segfault there was no error reporting to alert anyone of the problem and it took several days before anyone noticed that something was wrong.

The root of the problem seems to be that mod_virgule is simply taking whatever raw data a user puts in a form and passes it directly to xmlSetProp(). This works great as long you only give it valid UTF-8 data but it’s not designed to work on anything else. It seems to me that four things need to be done to fix this:

  • Pages need to explicitly specify UTF-8 as the doctype
  • All form data needs to be validated before passing to libxml
  • Invalid data needs to be converted or rejected
  • The trust metric code needs some additional error handling

If anyone has any thoughts on this or has had a similar experience with mod_virgule, I’d be curious to hear about it.

Sousa, Meat Paddles, and Clones

I’d better catch up on news before I start falling too far behind. For the 4th this year, Susan and I went to a an event up in Frisco. It was held at the Hall Office Park near the Texas Sculpture Garden. We saw the best fireworks we’ve seen in quite a few years. Prior to the fireworks we wandered around and marvelled at the size of the event – 20,000 people or something like that. And we listened to a short set of music played by Three Dog Night, an old 70’s era rock band hired for the event. A couple of the tunes sounded vaguely familiar but it wasn’t exactly Sousa-quality 4th of July music. Based on the last few years experience, the best music is to be heard at the Irving event held in Williams Square – where an actual orchestra plays Sousa marches the way God intended.

Meanwhile, mod_virgule development has started up again now that Gary is back on the job. My patch to make Raph’s new diary rating stuff configurable and fix a segfault caused by the new locking code made it into the latest release. More importantly, Gary has completed enough of the merges to completely eliminate one of the mod_virgule forks. Advogato and Badvogato can run off the same code now. We’ve still got some work to do before I’ll be able to get robots.net running on the main code base but hopefully that’s not too far away.

We also saw a couple of movies over the weekend. The Bourne Identity was fairly interesting. The only weird thing was the sound effect used during the fights. Rather than using traditional meat-paddles to get a realist fist into flesh sound, they came up with what sound like someone whacking a piece of plywood with a hammer. So every time there’s a fist-fight, it sounds like the characters are hollow and made of wood. I guess somebody thought it sounded cool. Probably the same people who add those totally unrealistic gunfire noises to movies.

Men in Black II was next on the list. The reviews are pretty much dead on. It’s fairly entertaining but not nearly as good as the first one. They only had about a 30 minute story and somehow managed to pad it out to 90 minutes. And annoyingly, almost all the good stuff was shown in the trailers and ads so there are really no surprises when you see the movie. The Peter Graves cameo was inventive though.

I don’t think I’ve mentioned it previously but we’ve now seen Attack of the Clones at both a traditional film theater and one of the new digital theaters. Bits of it looked better all-digital but other bits, like close-ups of the few live actors, looked better after film transfer. I’d like to see a real film at both to get a better comparison. ATOC is 99% computer animation so it’s kind of hard to judge how badly the lower resolution of the digital theaters is going to affect the quality of movies that are shot on film. As for ATOC itself, I wrote a lengthy review and then deleted it. Too many reviews and nobody really cares anyway I think. Basically it’s another Star Wars sequel. Like the rest of them, they don’t live up to the original and like episode 1 most of it has that animated look that makes you feel like you’re watching a cartoon rather than a real movie.