mbstring function overloading: don't use it

As a library author, the worst thing I have to deal with is PHP settings that affect global behaviour. Some examples of this include:

  • Making sure that the library still works in your specific locale setting.
  • Don't rely on a specific error_reporting setting to catch errors.
  • If it was 1997, don't rely on a specific magic_quotes or register_globals setting.
  • Don't rely on the current setting of mb_internal_encoding, and instead always pass the desired encodings to the mb_* functions.

Not only should I not rely on these settings, I also can't change them. I should assume that the application using my library might have a preference for a specific setting, so I can't dictate what the setting should be. The exception to this are cases where I change a setting temporarily and revert it.

Obviously I'm not perfect and not aware of every flag that changes the environment. When I come across incompatibility bug reports I'll quickly try to change the bits that affect this compatibility.

So now I'm faced with a bug report about my library failing when mbstring function overloading is turned on. Definitely something I've missed.

mbstring overloading alters the behaviour of 17 common PHP string functions, such as strpos and substr. Because I deal with binary data this fails on a number of places. The only solution is to look for all the instances where I'm using these functions and replace instances of strlen($string) with mb_strlen($string, '8bit');.

I'm using these functions on a ton of places though. I'm wondering in this case if I should simply throw an error when I find out function overloading is turned on.

Conclusion

To make a long story short. If you're ever intending to use external PHP libraries, there's a very good chance they haven't accounted for mbstring.func_overload. I can highly recommend always using the mb_* functions directly, and keep that setting off.


3 Responses to mbstring function overloading: don't use it

  1. 2026 Christof 2010-04-23 7:56 pm

    Zabbix ( http://www.zabbix.com/ ) requires the mbstring overloading. It checks it during the install and you have to disable it if you want to use any other PHP application. But I think they plan to remove this in the future, because it isn't really used anyway.

  2. 2027 gggeek 2010-04-23 9:37 pm

    Too many ini settings suck, fullstop.
    In php's case, it's the writers of libraries and frameworks that suffer from it, but overall the "registry pattern" makes any application unmaintainable.

  3. 2140 Shahar Evron 2010-04-25 5:48 am

    I had the same problem with Zend_Http_Client which relies on functions like strlen() and substr() to parse, split and measure streams of bytes - caring only about the byte-length of strings, not really about character length.

    It took a lot of experimentation to figure out the best approach to dealing with PHP environments that had mbstring overloading enabled - in those cases of course the value returned from strlen() for example was not the byte-length of a string, but the string length in characters, which may very well be different.

    Even worse, with Zend_Http_Client I did not want to assume that mbstring is enabled - as many turn it off.

    I ended up wrapping code chunks that do a lot of strlen() and such calls in checks for mbstring overloading, and if detected calling mb_internal_encoding('ASCII') at the beginning of the section, and then mb_internal_encoding($oldEncoding); at the end of it.

    Ugly, but it was the most efficient way in this case.

Leave a Reply



About

My name is Evert, and I've been writing semi-regularly on this blog since 2006.

I'm currently available for contract work.

more info.

Subscribe

Dropbox

Dropbox is a simple cross-platform online backup and sync application. The first 2GB of space is free, and both you and me get an extra 250MB extra space if you sign up through this link.