Output Transformation in a Zend Framework Model Layer

A few weeks back, Matthew Weier-O'Phinney wrote a very helpful discussion of model layer infrastructure using various components of the Zend Framework. I especially appreciated his advice on using Zend_Form as an input filter inside the model class itself; it provides a very clean way to keep validation and filtering logic properly encapsulated.

Zend_Form's use of Zend_Filter and Zend_Validate also makes it very easy to get precisely the filtering and validation rules you need. You can even filter through an external library like HTMLPurifier if you find you need the extra functionality, just by writing a new filter class; this has already been covered quite well (for example, see Part 8, Step 3 of Pádraic Brady's Zend Framework blog tutorial). As Weier-O'Phinney demonstrates, you can then use this Zend_Form object as a screening filter in your model class, so that certain properties must always pass through the form's validation process before they are set in the model itself. I won't duplicate his logic here either, but you should definitely take a look at it.

However, I've run into a minor problem, and I'm not sure my solution is particularly ideal. See, the Zend_Form approach described above does a great job of implementing Chris Shiflett's Filter Input, Escape Output principle...user input is filtered for invalid HTML before it's ever saved to the model, and can then be escaped as appropriate in the view layer. But what happens if you need to be able to retrieve the user's original unfiltered input later?

That might not sound like an appropriate thing to do, but consider this. Suppose that instead of simply sanitizing user-contributed HTML, you wanted to allow your users to use a simpler text input format (such as Markdown) and generate the HTML for them later? It wouldn't be appropriate to save the generated HTML to the model, since your users would then be unable to retrieve their original Markdown version for later editing. However, if you don't pre-generate the HTML, then you can't perform your HTMLPurifier sanitizing at the input stage either, since there isn't any HTML to sanitize yet.

In this situation, it looks to me like you'd be stuck doing all your input filtering in the presentation (output) layer, which doesn't really dovetail well with Shiflett's principle. But then again, there do appear to be two distinct types of "filtering" at work here, one of which is what Shiflett was talking about, and the other of which probably isn't:

  1. Sanitization, or making sure that user input doesn't contain any security risks.
  2. Transformation, or converting user input for presentational purposes. (I feel like this is different from escaping, since escaping is mainly concerned with defusing special characters?)

So what do you think? It's clear that sanitization ought to be done immediately upon input (preferably in the form object), but where should transformation happen?

Rob Allen's Zend Framework Overview from last year hints at implementing things like Markdown formatting in the view layer through the use of view helpers. This is certainly appropriate from a strict MVC perspective, as output transformation is definitely presentation-layer stuff. However, this isn't particularly DRY; every time you wrote a view script utilizing this data, you'd need to remember to run it through the appropriate chain of output filters.

So, my best overall idea (building on Weier-O'Phinney's examples) is to implement it in the getters in my model:

<?php
class My_Model
{
  // ...
  public function __get($property)
  {
    $method = 'get' . ucwords($property);
    if (method_exists($this, $method)) {
      return $this->$method();
    }
    if (array_key_exists($property, $this->_data)) {
      return $this->_data[$property];
    }
    return null;
  }
 
  public function getBody($applyOutputFilter = true)
  {
    $body = $this->_data['body'];
    if ($applyOutputFilter) {
      $body = $this->getOutputFilter()->filter($body);
    }
    return $body;
  }
 
  public function getOutputFilter()
  {
    $filterChain = new Zend_Filter();
    // add specific filter objects as appropriate, and then...
    return $filterChain;
  }
  // ...
}

This guarantees that whenever the "body" is accessed as a property, it's correctly transformed for HTML output (a sensible default).

However, both of these approaches still leave us with the same core problem: you almost inevitably end up doing all your input filtering at the presentation stage, rather than prior to saving it to the persistence layer as is usually recommended. This can be a security risk if you're not careful, and is almost certainly a performance hit for the average visiting user.

Any ideas on how best to resolve these issues?

Categories: