Posts from January 2010

Zend Framework Cron Tasks in Parallel

At the end of my recent post on building a cron service for Zend Framework applications, I mentioned a couple of weaknesses in my approach, most notably the lack any kind of locking mechanism. This post shows how to fix that.

Read more...

Cron tasks in Zend Framework apps

I recently took the opportunity to build a simple cron task manager for this blog; since the resulting system could easily be adapted to other Zend Framework applications, I figured I'd better share.

Read more...

2009 in review

It's been an interesting year. I realize New Year's has already come and gone, but I thought it'd be worth writing some last 2009ish thoughts anyway, just for posterity.

Read more...

Keeping your listeners in order

A couple of days ago I blogged about how Doctrine's SoftDelete behavior can keep other listeners' preDelete() hooks from firing; after a bit of coding this morning, I believe I have a solution.

Read more...

Know thy bottlenecks

One of my projects at work lately has been a searchable index of about 80,000 images, each involving about 20 fields' worth of metadata. It's a Drupal project, so it was pretty easy to set up the appropriate content types, fields, and so forth, but when it came time to set up searching, I made a few regrettable assumptions that cost me a lot of time.

Given the record count, I decided it didn't make much sense to use Drupal's core search functionality; I was under the impression that the core search just grepped through the contents of the node table, and would therefore not perform particularly well. That's regrettable assumption #1. Regrettable assumption #2 is simpler: I didn't think search would ever perform well as long as the index was stored in the database.

As a result, I went on an odyssey of sorts looking for replacement search engines. Some of the contenders:

Apache Solr from Acquia
Apache Solr is a Java-based search indexing platform with a supporting Drupal module, and as it happens, the Drupal support company Acquia provides a hosted Solr service that can be leveraged by subscribers. We do have an Acquia subscription, but unfortunately we also have hundreds of Drupal sites, and the subscription doesn't quite cover that many.
Self-hosted Apache Solr
We've occasionally considered setting up our own Solr instance as a service for our users around campus, but the administrative overhead doesn't really fit our schedules just yet. So again, I moved on.
Search Lucene API
Unlike the two Solr-based options, the Search Lucene API module handles its search indexing via PHP (specifically, via Zend_Search_Lucene). It also has a pretty good selection of helper modules available for things like faceted search, content suggestion, and so forth.

Of the three options, Search Lucene API seemed like the best choice with the least administrative overhead. Over the next couple of weeks I hacked away amid intermittent user support requests, slowly but surely piecing together the necessary components for a killer faceted search system. Once I was ready to try it, I started to import the content. Node by node it arrived, and the search kept on scaling successfully as it went. Pleased as punch, I went home for the evening so that the rest of the records could import.

The next morning my inbox was stuffed to the brim with out-of-memory errors from Drupal cron runs. I checked the search index settings; the system had managed to pull in around 33,000 records, but indexing had ground to a halt. It was so bad that I couldn't even access the index statistics page to tell it to rebuild. And this on a system with 112MB dedicated to PHP.

I was confused. I'd never experienced scaling problems with Zend Framework components before, and I couldn't imagine that Drupal added that much overhead. Not wanting to admit defeat, I posted an issue. Soon, the maintainer politely informed me that Search Lucene API was only intended to scale up to about 10,000 records, and less than that if they were particularly complicated.

It would seem I was hosed. However, I realized that there was one more contender I hadn't quite considered yet:

Drupal core search
Drupal comes with a built-in search module, and it's supported by any number of contributed helper modules providing the functionality it doesn't have on its own (e.g., faceted search).

Despairing of all other hopes, I turned off Search Lucene API and turned on the core search module with the appropriate helpers …and it handled everything without a hiccup!

As it turns out, Drupal's core search is a lot smarter than I'd given it credit for. Yes, it's searching against the database, but not the node table …it has a special search index table that is built up on cron runs, just like the other modules do it. With that in mind, it's no surprise that it's a lot faster than I had expected …plus, it doesn't introduce nearly the same PHP memory overhead as Search Lucene API, because a lot of the heavy lifting is offloaded to the database server (which, in our case, is more than powerful enough).

The moral of the story? Know thy bottlenecks. If I had realized how well Drupal's core search performed I never would have tried to optimize it out of the equation, and I would have saved myself a significant amount of development time. Good to know; lesson learned; hope this helps someone else.

When is a DELETE not a DELETE?

In my recent post on using Zend_Acl with Doctrine record listeners, I described a way to automate a Doctrine-based application's access control logic based on certain event hooks in Doctrine's record listener system. I still think it's a fairly elegant approach, but as I've been working with it, I discovered one behavior I didn't quite expect.

As it happened, one of the models on which I was using this technique also implemented Doctrine's core SoftDelete behavior. With SoftDelete enabled, calling $record->delete() doesn't actually remove the record from the database; instead, it provides and sets a deleted_at column and then adjusts all your other queries to treat any record with a deleted_at value as though it isn't there. In other words, all SQL DELETEs become UPDATEs, and all SQL SELECTs get an extra WHERE clause that ensures no "deleted" records are ever returned unless you explicitly ask for them. Pretty ingenious, really; it's nice if you think you'll ever need to recover from accidental deletions (though fortunately I haven't had to use it yet).

However, I recently discovered something I probably should have expected in the first place: when I called my SoftDelete-powered record's delete() method, my record listener's preDelete() hook wasn't firing; after some further research I discovered that it was firing preUpdate() instead.

As it turns out, since SoftDelete turns what would have been a SQL DELETE operation into a SQL UPDATE, the pre- and postDelete hooks are overridden with their *Update equivalents (at least after SoftDelete's own delete hooks have finished up). The unfortunate side effect? Since the preUpdate hook allows users with the "update" permission to proceed, users who had "update" could delete records, even if they didn't have the "delete" permission. Not a great setup, all things considered.

Now, I do have other protections in place. For one, I'm not only checking permissions at the model layer; my controllers do still have a few remaining ACL checks to avoid showing the user interfaces they won't actually be able to use. That said, I'd love to find a workaround for this, especially if I ever release any of this code for public use.

At the moment the only thing I can think to try is to figure out a way to ensure that my record listener is registered earlier in the stack than SoftDelete is. I'm not sure this is possible with how Doctrine behaviors are registered, but I figure it's worth some experimentation. I'll let you know how it goes.

< Previous | 1 | Next >