basename() is locale-aware

For years I've always just assumed:

  1. $baseName = basename('dir/file');

Was just an easy way to do:

  1. $file = 'dir/file';
  2. $baseName = substr($file,strrpos($file,'/')+1);

It turns out basename does a bit more than just splicing the string at the last slash, because it's locale aware. In my case I was dealing with a multi-byte UTF-8 string. It took me quite some time figuring out what was going on, because I was testing from the console which had the en_US.UTF-8 locale, and the bug was appearing on Apache, which defaults to the C locale.

Example:

  1. <?php
  2.  
  3. $str = urldecode('%C3%A0fo%C3%B3');
  4.  
  5. setlocale(LC_ALL,'C');
  6. echo urlencode(basename($str)) . "\n";
  7.  
  8. setlocale(LC_ALL,'en_US.UTF-8');
  9. echo urlencode(basename($str)) . "\n";
  10.  
  11. ?>

Output:

  1. fo%C3%B3
  2. %C3%A0fo%C3%B3

What bugs me about this, is that there was no way for me to know basename() operates on anything else than bytes. The PHP manual also doesn't point this out. It makes me wonder how many other string functions change behaviour based on their locale.


3 Responses to basename() is locale-aware

  1. 1323 Sean Coates 2010-03-29 2:49 pm

    Hopefully it bugs you enough to help fix the manual: http://php.net/dochowto

    (-:

    S

  2. 1324 Andy Thompson 2010-03-29 4:24 pm

    basename is also platform aware, which is another reason to use it.

  3. 13090 James Toborg 2011-06-11 3:51 pm

    First-rate writing and seriously helps with comprehending the issue much better.

Leave a Reply



About

My name is Evert, and I've been writing semi-regularly on this blog since 2006.

I'm currently available for contract work.

more info.

Subscribe

Dropbox

Dropbox is a simple cross-platform online backup and sync application. The first 2GB of space is free, and both you and me get an extra 250MB extra space if you sign up through this link.