Going to PHPBenelux

PHP-BENELUX-Logo.preview.png

Thanks to my awesome employer, I'll be heading to the PHPBenelux conference in Antwerp this weekend. Even though I've been doing PHP stuff for a bit, this is actually the first time I'm going to a PHP-specific conference. The schedule looks pretty interesting, so I'll be pretty busy.

If you're going and want to say hi or drink some beers friday or saturday, drop me a line!

My gripe with Prototype

Many of you might already know this, but I wanted to point out why I think using the Prototype Javascript library is a bad idea. The biggest problem is actually highlighted in it's name: it changes many of the prototypes of core javascript types.

You might have realized this before, when you tried to 'for(i in arr)' and came across many of the extra functions prototype added. (and you should have realized at this point this wasn't the proper way to loop through an array anyway.).

This is a big difference with other established libraries such as jQuery. If you want to use any of the jQuery functionality, you're expected to wrap other types in a jQuery object, for example:

  1. var myElem = $(myDomNode);

This augments the underlying variable with jQuery functionality. Besides the '$' (which can be turned off), jQuery pretty much keeps it's hands off your global namespace

Same with YUI. All the functionality is imported through the YUI object:

  1. YUI().use('node-base', function(Y) {
  2.  
  3. Y.on("domready", function() { console.log('ready!'); });
  4.  
  5. });

This is a stark contrast with Prototype. As soon as you include it, it changes basic types such as strings, arrays and numbers. An example:

  1. alert( [1, 2, 3].toJSON() ); // outputs "[1, 2, 3]"

While from an API perspective, this seems quite nice and by far the simplest. Prototype provides these handy methods close to where you need them.

This has one devastating effect though. It violates the holy "don't pollute the global namespace" rule. In an isolated environment this will work fine, but as soon as you work on an application that includes scripts from different sources or libraries these scripts are now also affected by prototypes changes to core types. In a "mash-up era" it's just not feasible to assume you'll be working in an isolated, sterile environment forever.

My latest example of having to hunt down what prototype feature caused a stir, was when I tried to use the JSON.stringify function. This is a fairly new feature, added by all the modern browsers.

Whenever stringify comes across an object that has a toJSON() method, it will call it. This allows objects to specify their own 'json representation' to for example filter out 'private properties'.

Example:

  1. var test = {
  2.  
  3. prop1: 'val1',
  4.  
  5. privateProp: 'hidden',
  6.  
  7. toJSON : function() {
  8.  
  9. return { prop1: this.prop1 };
  10.  
  11. }
  12. }
  13. alert(JSON.stringify(test));

The output of this last example will be :

  1. {"prop1":"val1"}

I would argue that this functionality is not a great design decision (separation of concerns!). However, it's there and it's standard. Prototype however, adds a toJSON() method to every Array, Object and String. In Prototype this has a different meaning though. The prototype methods actually json-encode themselves and return a string.

From an API perspective this is as bad as a choice as JSON.stringify defining toJSON(). And this problem highlights exactly why it's a bad idea, as these 2 libraries both define a global toJSON, and add their own meaning to it.

Example of how this fails:

  1. JSON.stringify({
  2. prop : [1, 2, 3, 4]
  3. });

The normal result:

  1. {"prop":[1,2,3,4]}

The result with prototype:

  1. {"prop":"[1, 2, 3, 4]"}

The easy fix is to simply get rid of toJSON functions as such:

  1. delete Object.prototype.toJSON;
  2. delete Array.prototype.toJSON;
  3. delete Hash.prototype.toJSON;
  4. delete String.prototype.toJSON;

There's even a comment on stackoverflow that fixes the issue and keeps Prototype's methods intact, but I know that as long as I will maintain applications that use Prototype, I'll have to deal with API collisions and incompatibilities.

Therefore, Prototype will never be the choice of JS library for me.

iCalendar / vCard parser for PHP

I've just finished an iCalendar vCard parser for PHP. It's done almost completely with a 'natural' simplexml-like interface, so it should (hopefully) be just as easy to parse, and also modify iCalendar / vCard objects (ics/vcf files).

To install using pear, run the following:

  1. pear channel-discover pear.sabredav.org
  2. pear install sabredav/Sabre_VObject-alpha

Or download from pear.sabredav.org.

For testing, I used this iCalendar file: icalendartest.ics.

To load in an object, you use the Reader class:

  1. // Link to the correct path if you manually dowloaded the package
  2. include 'Sabre/VObject/includes.php';
  3.  
  4. // Reading an object
  5. $calendar = Sabre_VObject_Reader::read(file_get_contents('icalendartest.ics'));

iCalendar objects consist of components (VEVENT, VTODO, VTIMEZONE, etc), properties (SUMMARY, DESCRIPTION, DTSTART, etc) and parameters, which are to properties what attributes are to elements in XML. To show a listing of all events in a calendar, this snippet would work:

  1. echo "There are ", count($calendar->vevent), " events in this calendar\n";
  2.  
  3. // Looping through events
  4. foreach($calendar->vevent as $event) {
  5.  
  6. echo (string)$event->dtstart, ": ", $event->summary, "\n";
  7.  
  8. }

You can easily modify properties:

  1. $calendar->vevent[0]->description = "It's a birthday party";

Creating new objects uses the following syntax:

  1. $todo = new Sabre_VObject_Component('vtodo');
  2. $todo->summary = 'Take out the dog';
  3. $calendar->add($todo);

And to turn your newly modified calendar back into an ics file:

  1. file_put_contents('output.ics', $calendar->serialize());

Lastly, parameters are accessible through array-syntax:

  1. echo (string)$calendar->vevent[0]->dtstart['tzid'], "\n";

I had fun building this, I hope it's useful to you as well. It's 100% unittested, but bugs might still appear due to the complex nature of API. Use at your own risk :). This library will be part of the SabreDAV project, which is also where you can go for the source, report bugs or make suggestions.

slowdeath - a simple denial of service attack for most PHP-based servers

The problem with Apache's approach to dealing with multiple clients, is that there's only ever a limited amount of Client processes available. This is usually is around a few hundred on common webservers.

Because of this, it becomes necessary to handle HTTP requests as quickly as possible. As soon as a request is handled, it can go on serving the next. If a client happens to have a slow connection, this can have a direct effect on the scalability of your frontend server.

A common way to fight this, is to put a caching server in front of your webserver, such as Varnish or Squid. These webservers are better suited to deal with many clients. This will allow your Apache server to send back HTTP responses quickly to the reverse proxy, and let the proxy deal with sending back the response to the client.

However, this doesn't deal with slow requests. Generally, these proxy servers will open connections directly to the backend webserver to avoid having to buffer larger request bodies.

Because PHP installations generally use apache 'prefork mpm', the number of possible connections is considerably low. This is also often the case with Fast-CGI based webservers, such as nginx and lighttpd. So if you were to just able to open up a few hundred connections, and drip in the bytes for the request body it would be very easy to take these servers down.

To test this theory, I wrote a simple python script that does exactly this, you can grab it from github. To use it, try something like this:

  1. python slowdeath.py --threads 200 http://localhost/

In my case my webserver was limited to 150 connections. It took about a second for it to stop serving requests.

Big warning: This tool is for research purposes only. Use at your own risk, and only on servers you own.

To take out a server, simply specify a number of threads higher than the MaxClients or whatever setting your webserver happens to use. Note that I only tested this on a few servers, so results may vary. Side effects include diarrhea, rashes, blackouts and death. Do not use while driving.

Internationalized domain names, are you ready?

Since may 11 TLD's (top-level domainnames) have been added. In order for this to work successfully, a lot of applications will have to be fixed.

Many email-validation scripts might use an approach like this:

  1. $ok = preg_match('/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}$/i', $email);

This one is pretty simple, it matches the most common address formats, as long as the tld (.com, nl, .uk, etc) is under 6 characters. For a bit more sophistication you might want to ensure that the tld is a bit more valid:

  1. $ok = preg_match('/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.(?:[A-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)$/i',$email);

Note: both these regexes were taken from regular-expression.info. The top google hit, and decent examples.

The new TLD's use non-ascii characters, and they might become aliases for existing top-level domains, or new tld's altogether. Here are the currently working examples:

At first sight these look like regular utf-8, characters, but if you look at the sourcecode of this page, you'll notice that it's actually encoded differently.

The korean url http://실례.테스트, is actually encoded as http://xn--9n2bp8q.xn--9t4b11yi5a/. This is called Punycode.

If you want support for these new urls (and thus domainnames in emails), you should have support for punycode. You will likely receive UTF-8 encoded domainnames for email address (example@실례.테스트), but internally you must make sure that you only deal with the punycode representation.

This translating is also what modern browsers do. If you were to paste "http://xn--9n2bp8q.xn--9t4b11yi5a/" directly in the firefox address bar, it will show you the UTF-8 characters instead. Firefox will re-encode to punycode though and use that format for HTTP requests.

The best way really to check for valid email addresses is to use a very liberal regex, but verify with a simple MX record lookup if a mailserver exists for the given domain. This example is an expansion on the first regex.

  1. $email = 'example@xn--9n2bp8q.xn--9t4b11yi5a';
  2.  
  3. if(preg_match('/^[A-Z0-9._%+-]+@([A-Z0-9.-]+\.[A-Z0-9-]{2,})$/i', $email,$matches)) {
  4. $hostname = $matches[1];
  5. if (!getmxrr($hostname, $hosts)) {
  6. echo "Host has an MX record\n";
  7. } else {
  8. echo "Host does not exist or does not have an MX record\n";
  9. }
  10. } else {
  11. echo "Email address did not match regular expression\n";
  12. }

The preceeding code does not convert UTF-8 to punycode though. There's not yet an easy native way in PHP to do this, but Pear's Net_IDNA2 provides a way. The implementation seems very complex though, and leaves me wondering if there's an easier way to go about it.

SabreDAV 1.3.0 released

I just released version 1.3.0 of SabreDAV. Uptake has been very strong, especially for the CalDAV components. The biggest change is a big performance boost for most tree operations.

To upgrade, download the new file here, or if you installed it using pear:

  1. pear upgrade sabredav/Sabre_DAV
  2. pear upgrade sabredav/Sabre_CalDAV

To install using pear:

  1. pear channel-discover pear.sabredav.org
  2. pear install sabredav/Sabre_DAV
  3. pear install sabredav/Sabre_CalDAV

There is a list of 4 (smallish) backwards compatibility breaks in the API. You can read about it in the migration guide.

Full list of changes:

  • Added: Cache layer in the ObjectTree.
  • Added: childExists method to Sabre_DAV_ICollection. This is an api break, so if you implement Sabre_DAV_ICollection directly, add the method.
  • Changed: Almost all HTTP method implementations now take a uri argument, including events. This allows for internal rerouting of certain calls. If you have custom plugins, make sure they use this argument. If they don't, they will likely still work, but it might get in the way of future changes.
  • Changed: All getETag methods MUST now surround the etag with double-quotes. This was a mistake made in all previous SabreDAV versions. If you don't do this, any If-Match, If-None-Match and If: headers using Etags will work incorrectly. (Issue 85).
  • Added: Sabre_DAV_Auth_Backend_AbstractBasic class, which can be used to easily implement basic authentication.
  • Removed: Sabre_DAV_PermissionDenied class. Use Sabre_DAV_Forbidden instead.
  • Removed: Sabre_DAV_IDirectory interface, use Sabre_DAV_ICollection instead.
  • Added: Browser plugin now uses {DAV:}displayname if this property is available.
  • Added: Tree classes now have a delete and getChildren method.
  • Fixed: If-Modified-Since and If-Unmodified-Since would be incorrect if the date is an exact match.
  • Fixed: Support for multiple ETags in If-Match and If-None-Match headers.
  • Fixed: Improved baseUrl handling.
  • Fixed: Issue 67: Non-seekable stream support in ::put()/::get().
  • Fixed: Issue 65: Invalid dates are now ignored.
  • Updated: Refactoring in Sabre_CalDAV to make everything a bit more ledgable.
  • Fixed: Issue 88, Issue 89: Fixed compatibility for running SabreDAV on Windows.
  • Fixed: Issue 86: Fixed Content-Range top-boundary from 'file size' to 'file size'-1.

I plan to fully keep supporting the 1.2.* branch, but I'll backport bugfixes strictly on an on-demand basis. So far there's been relatively little people stuck on older versions, so I'm only spending time on it in case anyone depends on it.

Thanks to all the people reporting bugs and posting patches!

Ubuntu has a new font

Along with the release of 10.10, Ubuntu came with a new self-named font. I love it. It's quirky, yet very legible.

The font is open-source, with a pretty straightforward license, which comes down to: 'include this license when redistributing. There's very little good free fonts out there that actually allow you to embed it on your site, but with this one you can.

You can download the ttf's from here. Embedding it using css is easy:

  1. @font-face {
  2. font-family: "Ubuntu Sans";
  3. src: url('font/ubuntu/Ubuntu-R.ttf');
  4. }
  5. @font-face {
  6. font-family: "Ubuntu Sans";
  7. src: url('font/ubuntu/Ubuntu-B.ttf');
  8. font-weight: bold
  9. }
  10. @font-face {
  11. font-family: "Ubuntu Sans";
  12. src: url('font/ubuntu/Ubuntu-I.ttf');
  13. font-style: italic
  14. }
  15. @font-face {
  16. font-family: "Ubuntu Sans";
  17. src: url('font/ubuntu/Ubuntu-BI.ttf');
  18. font-style: italic; font-weight: bold
  19. }

This looked immediately brilliant on Firefox, but Safari acts a bit weird, only anti-aliasing some of the text after hovering over.

Be aware though, this will add about 1.3MB to your page. If you don't need some of the italic or bold variations, i'd recommend leaving them out.

On font and copyrights

On a more serious note, many people don't know that most fonts you buy for your websites are never allowed to be straight-embedded into webpages. I've seen a number of people embedding their fonts with either @font-face or the dirty (but impressive) cufon, or the worst of all worlds: sifr.

Technically, with any of these technologies you are not just using, but redistributing the font. When you buy a font you are basically only allowed to generate static images. This might not be a big deal for your personal site, but it's not a wise thing to do for commercial sites.

Killing a dead ssh connection

One feature telnet has and I always missed from ssh was the ^] shortcut, giving you a way to terminate the connection.

ssh has a similar feature. If you setup 'escape characters', you can terminate the connection by typing '~.' Just add the following to your .ssh/config:

  1. Host *
  2. EscapeChar ~

You can change the character here too, but ~ is the default and a sensible one.

If you're dealing with crappy ssh connections that often terminate, you can add the following to make the client send a keep-alive package every 60 seconds:

  1. Host *
  2. ServerAliveInterval 60

Evercookie: the cookie that just won't die

Samy, famous for his worm, released evercookie this week. Evercookie stores cookies is various storage mechanisms such as Flash Local Shared Objects (also known as flookies), HTML5 storage mechanisms and even in the history and cache. When any of these are wiped by the user the script will repopulate it, making it very hard to get rid of your cookies.

This is technique is common to circumvent a users' privacy wishes, which Clearspring recently got sued for, but it's put in overdrive.

One good use for it is banning users. In the past I've used ips + cookies to ensure a user stays banned, but it doesn't take much to change your ip address and clear your cookies. All these techniques together make it a lot harder to get through. Because Flash stores it's flookies in a central place in the operating system, the cookies often even live in multiple browsers and private browsing sessions.

Most of all, I think the tool is made to make a point. It's very hard for the average user to clear all the tracking information. It should be doable with a press of a button, without losing all your settings and history for every other site.

Content Security Policy introduction

I blogged about Content Security Policy about 2 year ago when it was still called 'Site Security Policy'. It started as a specification and an add-on, and turned into a patch a bit later. Finally it made it into Firefox 4 beta 1. I think CSP is the next web security revolution, so make yourself aware of how it works and the implications.

So what is it? The short version is that it's a very effective measure against cross-site scripting. By specifying a policy through the 'X-Content-Security-Policy', you can specify exactly from which locations you accept javascript and other content. This allows you to block scripts from any domains unknown to you, and inline scripts altogether.

A simple example

  1. X-Content-Security-Policy: allow 'self'

A simple PHP example to see this in action:

  1. <?php
  2.  
  3. header("X-Content-Security-Policy: allow 'self'");
  4.  
  5. ?>
  6. <html>
  7. <head>
  8. <title>CSP test</title>
  9. </head>
  10. <body>
  11.  
  12. <script type="text/javascript">
  13.  
  14. alert('XSS!');
  15.  
  16. </script>
  17.  
  18. </body>
  19. </html>

If the above code is opened in Firefox 4.0 beta1, the script will not execute, and a warning is added to the "Error Console" (in the Tools menu).

Not only does this header block inline scripts, it also blocks the following:

  • eval(). This important for people using eval() to parse json responses.
  • setTimeout and setInterval if the function is provided as a string.
  • javascript: urls
  • HTML event attributes (onclick, onload, etc.).
  • All images, plugin objects (flash, quicktime etc.), audio, video, html frames and fonts not served from the same domain as the html page.
  • XMLHttpRequest to domains other than the source domain.

Fortunately there are fine grained controls about what you want to allow from which domains. Here are some examples from the specification.

  1. X-Content-Security-Policy: allow 'self'; img-src *; \
  2. object-src media1.com media2.com *.cdn.com; \
  3. script-src trustedscripts.example.com

This example starts with "allow 'self'", allowing only content from the same domain. The "img-src *" rule allows images from any domain. "object-src: media1.com media2.com" allows <object> tags to use files from media1.com, media1.com and the same domain as the html was served from. To learn more about these, I would recommend just taking a good look at the directives list in the specification.

Options and reporting

Using the 'options' directive it's possible to turn on specific measures. Valid values for options are 'eval-script' and 'inline-script'.

  1. X-Content-Security-Policy: allow 'self'; options inline-script, eval-script

The preceding example allows inline scripts (using html event attributes, or the script tag) as well as the 'eval()' function. In general I would try to avoid this though.

When a security rule is violated, it's possible to get the browser to send a report back to the server. For example, if an image is referenced from a blocked domain, the browser can send a simple report to a url you specify.

  1. X-Content-Security-Policy: allow 'self'; report-uri http://example.org/cspreport.php

This allows you to detect any problems with your policy, or successful attempts by your evil users to inject code. An example of such a report is the following:

  1. {
  2. "csp-report":
  3. {
  4. "request": "GET http://index.html HTTP/1.1",
  5. "request-headers": "Host: example.com
  6. User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.3a5pre) Gecko/20100601 Minefield/3.7a5pre
  7. Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
  8. Accept-Language: en-us,en;q=0.5
  9. Accept-Encoding: gzip,deflate
  10. Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
  11. Keep-Alive: 115
  12. Connection: keep-alive",
  13. "blocked-uri": "http://evil.com/some_image.png",
  14. "violated-directive": "img-src 'self'",
  15. "original-policy": "allow 'none'; img-src *, allow 'self'; img-src 'self'"
  16. }
  17. }

Final notes

Using CSP does not mean you can go easy on other security measures. At the moment a very limited amount of users will have support for CSP, so everybody else still needs to be protected. However, it's still a great idea to implement. Your Firefox users will automatically be protected better, and because of the reporting functionality, they automatically help you detect holes which benefits everybody.

My guess is that CSP is going to be very important, and is here to stay. There are two things you can do to prepare for the future:

  1. Figure out your policy. It's a good idea for your web application to know anyway where resources are coming from. Especially advertisers tend to be bad at using many different domains and scripts using other scripts.
  2. Try to avoid any inline scripting, html event handlers and eval(). They are all avoidable, and in my opinion it is a good idea to keep your javascript out of html anyway. This is a big one, because both inline scripts and html events are still very popular. With the popularity of libraries such as jQuery, I do think it will be easier to just grab most of the inline scripts and move them to an external script.
← Previous  1 2 3 4 5  23 Next →

About

My name is Evert, and I've been writing semi-regularly on this blog since 2006.

I'm currently available for contract work.

more info.

Subscribe

Dropbox

Dropbox is a simple cross-platform online backup and sync application. The first 2GB of space is free, and both you and me get an extra 250MB extra space if you sign up through this link.