When to escape your data

Two examples of escaping data are the following:

The question I'd like to ask today is, when to do this? There are two possible moments:

  1. Right when the data comes in. For SQL this used to be done with 'magic quotes' quite a bit in PHP-land. In general I don't see this happening a lot anymore for SQL. I do however see data encoded using htmlentities/htmlspecialchars before entering the database.
  2. The other way to go about it, is to only escape when you know how you're going to use it. For example, only call htmlspecialchars right before you echo() your data into your document.

I would personally argue that #2 is the best way to go about things. The first reason is that you don't know exactly how your data might be used in the future. If you pre-encoded everything using htmlentities, but at some point in the future you need the data to be used in an XML feed, you're going to be in trouble. The reason for this, is that the only valid entities in XML are &, <, >, and &quote;. If you are going to need to need to output to CSV, very different rules apply. Other examples are: escaping for urls, escaping for command-line arguments, escaping for javascript and escaping for mime-headers.

In the illustrated example, this is no big disaster. A workaround would be to call htmlspecialchars_decode() or html_entity_decode() first, and then escape for your desired output. A worse case is filtering. If you have been stripping out all, or some html tags before saving it do the database, and later on your decide you wanted to show some of them anyway, that data is now lost.

Conclusion

So my argument is to store raw data. Only encode right before you know where you going to need it. If you're worried about the overhead of escaping right before output in an html page, cache the output.

Whichever route you go, make sure this is clearly documented. There's 2 ways this can go wrong:

  1. Escaping is done on input and output. Now you see literal &'s in your html, or quotes prepended by slashes. (\'hello\').
  2. Escaping is forgotten at both ends. Now you might be vulnerable to SQL injection attacks, XSS attacks or data corruption.

What do you think? I'm especially interested in the other side of the argument.

A case for table-based design

"A topic that has been beaten to death."

Standards advocates have been largely against the use of the <table> tag. The idea is that it's used for positioning and design, while it really should only be used for real tabular data. HTML should be a document with semantic data, and not contain any presentation information.

So the response is that all kinds of new techniques are devised to easily make stretchable designs. I just got handed a template that uses one of these techniques, along with a couple of others. Here's a snippet:

I can't say I blame the author of this code. He's always been told <table>'s are evil, and I have no doubts many 'experts' will suggest techniques like this.

There's no doubt this could be made much simpler even without using the <table>, but there's no way we can expect every junior frontend developer to memorize "1001 css hacks to make divs behave as a table".

The truth is, very little html documents are parsed by anything else but browsers and search engines. If an application's data is also consumed by other clients it will almost always be done through some kind of API or standard xml/json document. Even if html was used as a transport format, it will most likely be a specialized format.

Keeping HTML pure for data and CSS for presentation is a bit of a pipedream that never worked out. Even if you look at a relatively simple application such as Wordpress, every theme will have it's own HTML template, and not just a separate css stylesheet.

The point of my story is: HTML is read by browsers and developers. If you can make a brilliant HTML document and still keep pixel precision more power to you, but please keep things legible for the future developer that might need to fix a bug.

The upshot

CSS3 has support for multiple backgrounds, which will eliminate the need a lot of these problems. Safari has already supported this for a while, and Firefox will get it with the 3.6 release, and opera in 10.5. This leaves the browser that shall not be named.

Game of life with checkboxes

I needed to kill a little bit of time, so I decided to write Conway's Game of Life in HTML.

Try it!

HTML Purifier rocks!

HTML purifier

I had to create an RSS aggregator for my job, and I had to find (or create) a good tool that sanitizes the HTML that comes in. I stumbled upon HTML purifier, and I haven't seen a better tool for the job yet.

Some of the features:

  • It can turn the html into valid XHTML (transitional or string)
  • So it also balances tags out..
  • Removes any code that could expose a security risk. (tested with RSnakes XSS cheatcheat).
  • Allows you to truncate HTML (if you don't want to show an entire post) and still results in proper HTML!

So yea, if you need something similar; I'd suggest you check it out..

Sharing sessions between html and flash

cookieThis has been an issue that has been driving me pretty crazy.. I can't seem to find out how to share a (cookie-)session between flash and php.

The problem is that in certain situations Flash ignores session cookies when sending requests. The situations I know of are Flash Uploads and using Flash Remoting in internet explorer.

I asked my question on #webappsec and on the web application security mailing list, but there wasn't really somebody who could answer my quesion..

Options

  1. I can pass the session id using flashvars directly. Problem with this is, is that the session id is directly embedded into the html and can therefore be stolen using CSRF.
  2. I can use a temporary token, but anybody who has this token can do everything the user can in the flash application. For just the uploads it can work, but for everything else its not really flexible, and doesn't really fix the problem.
  3. I could turn off httponly cookies and pass the session id using javascript straight to the flash movie.. This could be me only option, but I dislike it because its not as transparent as it should be and requires additional logic using javascript and flash (and php).
  4. Force the user to login when using flash.. Not really a nice solution from a usuability perspective..

I'm wondering how other people go about this.. Is there a satisfying solution at all? Or can it only be done using a combination of nasty hacks?

 1

About

My name is Evert, and I've been writing semi-regularly on this blog since 2006.

I'm currently available for contract work.

more info.

Subscribe

Dropbox

Dropbox is a simple cross-platform online backup and sync application. The first 2GB of space is free, and both you and me get an extra 250MB extra space if you sign up through this link.