HTML Purifier rocks!

HTML purifier

I had to create an RSS aggregator for my job, and I had to find (or create) a good tool that sanitizes the HTML that comes in. I stumbled upon HTML purifier, and I haven't seen a better tool for the job yet.

Some of the features:

  • It can turn the html into valid XHTML (transitional or string)
  • So it also balances tags out..
  • Removes any code that could expose a security risk. (tested with RSnakes XSS cheatcheat).
  • Allows you to truncate HTML (if you don't want to show an entire post) and still results in proper HTML!

So yea, if you need something similar; I'd suggest you check it out..


4 Responses to HTML Purifier rocks!

  1. 39 Stoyan 2007-10-23 2:54 pm

    How exactly do you truncate HTML ? I wasn't able to find the method in the docs.
    Is there a way to truncate only the text (ignoring the html tags length) ?

  2. 38 Evert 2007-10-23 3:59 pm

    Hey Stoyan,

    I simply do a substr on the text.. no fancyness

  3. 37 Thierry Schellenbach 2007-10-28 8:15 am

    This is indeed a really nice tool. I needed to secure a templating system, great stuff :)

  4. 40 Edward Z. Yang 2007-11-01 7:27 pm

    Stoyan: Generally, I recommend people use strip_tags on the HTML, and then using a smart string truncator. (Don't forget to properly escape the data on final output!) There is usually no need for the HTML to be shown in such cases.

    The behavior that Evert is describing probably has to do with HTML Purifier's tag balancing capabilities: asdf (with presumably the rest truncated) becomes asdf

Leave a Reply



About

My name is Evert, and I've been writing semi-regularly on this blog since 2006.

I'm currently available for contract work.

more info.

Subscribe

Dropbox

Dropbox is a simple cross-platform online backup and sync application. The first 2GB of space is free, and both you and me get an extra 250MB extra space if you sign up through this link.