Escaping MySQL strings with no connection available

We're all being drilled over and over again to always use mysqli::escape_string, PDO::quote, or preferably prepared statements when escaping user-supplied strings for use in MySQL queries.

The downside to these methods is that they only work when there's an open connection to a server. So what if there's no connection available? In traditional Unix philosophy I'm writing an export script that doesn't execute SQL statements right to a server, but sends them to stdout. Forcing people to make a connection seems like a major inconvenience.

So what's left? Manual escaping I suppose.. The manual page for mysqli::escape_string mentions: Characters encoded are NUL (ASCII 0), \n, \r, \, ', ", and Control-Z..

  1. function dontHateMe($unescaped) {
  2. $replacements = array(
  3. "\x00"=>'\x00',
  4. "\n"=>'\n',
  5. "\r"=>'\r',
  6. "\\"=>'\\\\',
  7. "'"=>"\'",
  8. '"'=>'\"',
  9. "\x1a"=>'\x1a'
  10. );
  11. return strtr($unescaped,$replacements);
  12. }

There's a risk though.. Certain multi-byte character sets (such as BIG5 and GBK) may still allow for a security hole. You *should* be fine with UTF-8, so make sure you start your file with:

  1. SET CHARACTER SET utf8;

Still no guarantee from my side though. Tread carefully and avoid this if you can. If you have a better idea, or you feel like shouting at me for this.. let's hear it in the comments.

Special thanks to Spudley for providing me with a reasonable answer to this question.

When to escape your data

Two examples of escaping data are the following:

The question I'd like to ask today is, when to do this? There are two possible moments:

  1. Right when the data comes in. For SQL this used to be done with 'magic quotes' quite a bit in PHP-land. In general I don't see this happening a lot anymore for SQL. I do however see data encoded using htmlentities/htmlspecialchars before entering the database.
  2. The other way to go about it, is to only escape when you know how you're going to use it. For example, only call htmlspecialchars right before you echo() your data into your document.

I would personally argue that #2 is the best way to go about things. The first reason is that you don't know exactly how your data might be used in the future. If you pre-encoded everything using htmlentities, but at some point in the future you need the data to be used in an XML feed, you're going to be in trouble. The reason for this, is that the only valid entities in XML are &, <, >, and &quote;. If you are going to need to need to output to CSV, very different rules apply. Other examples are: escaping for urls, escaping for command-line arguments, escaping for javascript and escaping for mime-headers.

In the illustrated example, this is no big disaster. A workaround would be to call htmlspecialchars_decode() or html_entity_decode() first, and then escape for your desired output. A worse case is filtering. If you have been stripping out all, or some html tags before saving it do the database, and later on your decide you wanted to show some of them anyway, that data is now lost.

Conclusion

So my argument is to store raw data. Only encode right before you know where you going to need it. If you're worried about the overhead of escaping right before output in an html page, cache the output.

Whichever route you go, make sure this is clearly documented. There's 2 ways this can go wrong:

  1. Escaping is done on input and output. Now you see literal &'s in your html, or quotes prepended by slashes. (\'hello\').
  2. Escaping is forgotten at both ends. Now you might be vulnerable to SQL injection attacks, XSS attacks or data corruption.

What do you think? I'm especially interested in the other side of the argument.

 1

About

My name is Evert, and I've been writing semi-regularly on this blog since 2006.

I'm currently available for contract work.

more info.

Subscribe

Dropbox

Dropbox is a simple cross-platform online backup and sync application. The first 2GB of space is free, and both you and me get an extra 250MB extra space if you sign up through this link.