OS/X WebDAV and Chunked Transfer Encoding
While OS/X's WebDAV implementation is quite slow, it is mostly pretty decent. The client uses the little used Chunked transfer encoding for PUT requests, which allows it to send big files without knowing exactly how big the file is going to be. A request like this looks like this:
PUT /image.png HTTP/1.1
Host: example.org
User-Agent: WebDAVFS/1.8 (01808000) Darwin/10.2.0 (i386)
Accept: */*
X-Expected-Entity-Length: 10316
If: (<opaquelocktoken:44445502-c253-02e6-7198-45b36c96e8c7>)
Connection: close
Transfer-Encoding: Chunked
While this is a perfectly legal HTTP request, webservers choke on it. Both Nginx and Lighttpd respond with HTTP 411 Length Required. This would have been valid for HTTP/1.0 servers, but if they claim to support HTTP/1.1 they must accept these requests.
Apache + mod_php does this fine, but just recently I got a report from someone using Apache + fastcgi + php. In this case the request body never arrived in PHP which can unfortunately result in silent data loss.
So I guess that's a bit of a warning, so far OS/X WebDAV only plays nicely with Apache + mod_php servers.
WebDAV-related RFC's
In an attempt to better understand the WebDAV standards space, I made up a non-scientific graph of all the specs and dependencies. I'd like to get started with CalDAV, but I have a few other specs to implement before I'll be able to do that.

The next one for me on the list is ACL. Attempting to integrate these new features within the existing system so far has proven to be very challenging. The big reason is my (perhaps high) requirements on how this is supposed to work:
- It shouldn't touch the existing WebDAV system (at all), because 99% of the users will not use ACL.
- The interface & implementation should still be understandable if you are implementing ACL.
- I like the existing WebDAV class structure as it stands, so if I have to make changes in the design; it should still be easy to grasp.
HTTP Basic and Digest authentication with PHP
HTTP authentication is quite popular for web applications. It is pretty easy to implement and works for a range of http applications; not to mention your browser.
Basic Auth
The two main authentication schemes are 'basic' and 'digest'. Basic is pretty easy to implement and appears to be the most common:
<?php
$username = null;
$password = null;
// mod_php
if (isset($_SERVER['PHP_AUTH_USER'])) {
$username = $_SERVER['PHP_AUTH_USER'];
$password = $_SERVER['PHP_AUTH_PW'];
// most other servers
} elseif (isset($_SERVER['HTTP_AUTHENTICATION'])) {
if (strpos(strtolower($_SERVER['HTTP_AUTHENTICATION']),'basic')===0)
list($username,$password) = explode(':',base64_decode(substr($_SERVER['HTTP_AUTHORIZATION'], 6)));
}
if (is_null($username)) {
header('WWW-Authenticate: Basic realm="My Realm"');
header('HTTP/1.0 401 Unauthorized');
echo 'Text to send if user hits Cancel button';
die();
} else {
echo "<p>Hello {$username}.</p>";
echo "<p>You entered {$password} as your password.</p>";
}
?>
Well it's a bit difficult I suppose, but you might have noticed the username and password are sent over the wire using base64 encoding. Not really secure, unless you have SSL in place.
Digest
Digest is designed to be more secure. The password is never sent over the wire in plain text, but rather as a hash. The implications of the usage of a hash is that it can never be decrypted. We can only validate the hash by applying the same hash function to the password we have. If the hashes match, the password was correct.
Lets first see how Digest auth should work:
Client requests url
GET / HTTP/1.1
Server requires authentication
HTTP/1.1 401 Unauthorized
WWW-Authenticate: Digest realm="The batcave",
qop="auth",
nonce="4993927ba6279",
opaque="d8ea7aa61a1693024c4cc3a516f49b3c"
Client authenticates
GET / HTTP/1.1
Authorization: Digest username="admin",
realm="The batcave",
nonce=49938e61ccaa4,
uri="/",
response="98ccab4542f284c00a79b5957baaff23",
opaque="d8ea7aa61a1693024c4cc3a516f49b3c",
qop=auth, nc=00000001,
cnonce="8d1b34edb475994b"
Information coming from the server:
| realm | A string which will be used within the UI and as part of the hash. |
| qop | Can be auth and auth-int and has influence on how the hash is created. We use auth. |
| nonce | A unique code, which will be used within the hash and needs to be sent back by the client. |
| opaque | This can be treated as a session id. If this changes the browser will deauthenticate the user. |
Information from the client:
| username | The supplied username |
| realm | Same as server response. |
| nonce | Same as server response. |
| uri | The authentication uri |
| response | The validation hash. |
| opaque | Same as server response. |
| qop | Same as server response. |
| nc | Nonce-count. This a hexadecimal serial number for the request. The client should increase this number by one for every request. |
| cnonce | A unique id generated by the client |
So how do we know if the password was correct? We van validate using the following formula (pseudo code).
A1 = md5(username:realm:password)
A2 = md5(request-method:uri) // request method = GET, POST, etc.
Hash = md5(A1:nonce:nc:cnonce:qop:A2)
if (Hash == response)
//success!
else
//failure!
Or, using PHP:
<?php
$realm = 'The batcave';
// Just a random id
$nonce = uniqid();
// Get the digest from the http header
$digest = getDigest();
// If there was no digest, show login
if (is_null($digest)) requireLogin($realm,$nonce);
$digestParts = digestParse($digest);
$validUser = 'admin';
$validPass = '1234';
// Based on all the info we gathered we can figure out what the response should be
$A1 = md5("{$validUser}:{$realm}:{$validPass}");
$A2 = md5("{$_SERVER['REQUEST_METHOD']}:{$digestParts['uri']}");
$validResponse = md5("{$A1}:{$digestParts['nonce']}:{$digestParts['nc']}:{$digestParts['cnonce']}:{$digestParts['qop']}:{$A2}");
if ($digestParts['response']!=$validResponse) requireLogin($realm,$nonce);
// We're in!
echo 'Well done sir, you made it all the way through the login!';
// This function returns the digest string
function getDigest() {
// mod_php
if (isset($_SERVER['PHP_AUTH_DIGEST'])) {
$digest = $_SERVER['PHP_AUTH_DIGEST'];
// most other servers
} elseif (isset($_SERVER['HTTP_AUTHENTICATION'])) {
if (strpos(strtolower($_SERVER['HTTP_AUTHENTICATION']),'digest')===0)
$digest = substr($_SERVER['HTTP_AUTHORIZATION'], 7);
}
return $digest;
}
// This function forces a login prompt
function requireLogin($realm,$nonce) {
header('WWW-Authenticate: Digest realm="' . $realm . '",qop="auth",nonce="' . $nonce . '",opaque="' . md5($realm) . '"');
header('HTTP/1.0 401 Unauthorized');
echo 'Text to send if user hits Cancel button';
die();
}
// This function extracts the separate values from the digest string
function digestParse($digest) {
// protect against missing data
$needed_parts = array('nonce'=>1, 'nc'=>1, 'cnonce'=>1, 'qop'=>1, 'username'=>1, 'uri'=>1, 'response'=>1);
$data = array();
preg_match_all('@(\w+)=(?:(?:")([^"]+)"|([^\s,$]+))@', $digest, $matches, PREG_SET_ORDER);
foreach ($matches as $m) {
$data[$m[1]] = $m[2] ? $m[2] : $m[3];
unset($needed_parts[$m[1]]);
}
return $needed_parts ? false : $data;
}
?>
As you can see we need to have a plain-text version of the password in order to validate the user. It's not a good idea to store the plain-text password, therefore it's strongly recommended to store the result of $A1 instead.
Security improvements
- It's smart to validate the contents of opaque, nonce and realm. If you have the data stored on the server, why not check it.
- The nc should be an ever increasing number. You could store the number and track to make sure it doesn't make any big jumps. It's not wanted to be extremely strict about the sequence, because you might miss a number, and requests could come in be out of order.
- 'qop' is quality of protection. This serves as an integrity code for the request. A hacker could steal all your HTTP Digest headers and simply change the body to make it do something else. If 'qop' is set to 'auth', only the requested uri will be taken into consideration. If 'qop' is 'auth-int' the body of the request will also be used in the hash. (A2 = md5(request-method:uri:md5(request-body))).
References:
Apache speed and reverse proxies
In our environment we use Apache everywhere. It's PHP integration has so far proven superiour. Now we're dealing with higher loads and we've hit some limitations.
One of the problems we had, is Apache's heaviness. Our apache2 worker processes eat up around 20 Megabytes of memory, and with 3 GB of memory will bring us up to a setting of around 150 MaxClients. Rasmus seems to think that's a pretty high setting, but based off the easy calculation (memory available for apache / size of an apache process) it works out for us.
Effectively this means we can serve approximately this much parallel request on this machine. It is therefore in our greatest benefit to get every response out as quickly as possible, increasing the amount of requests we can handle per second.
Going beyond this 150 number could cause Linux to start using swap. This is bad, because it will add latency to the response, which in turn will result connections staying open longer.
Since we're sending everything over the web, there is a standard latency. Information traveling to the other side of the globe will at least take 67ms because we're restricted to the speed of light. This doesn't even take non-direct routes nor other hardware latency into account. According to Till this all adds up to the time a single Apache process takes up before working on the next request.
The reverse proxy
There are a couple of webservers which seem to be optimized for serving lots of clients. Lighttpd got a lot of traction earlier, but the project seems to have slowed down a lot as the much anticipated 1.5 release has been under development for almost 2 years. nginx seems have taken it's place in terms of disruptiveness. These servers are much more lightweight, and are supposed to be faster in delivery of static files.
Much like Till, we've had issues hooking PHP directly into these servers. Till suggests the solution of actually placing nginx in front of Apache (on the same machine) as a reverse proxy. Nginx takes care of serving static files and proxies any PHP request to Apache. The concept is that Apache can push out the response as quickly as possible, and while Nginx is working on delivering it to the (slow) client Apache can take on other work.
The thing that bothers me with this setup, is that the need for 2 webserver products to achieve a single task. This implies that neither of them is adequate on it's own to do the job.
On the other hand, this type of setup is also what a lot of people seem to be doing by placing Squid in front of their webservers, although that tends to happen on separate hardware.
HTTP/1.1 100 Continue
All of a sudden we noticed a problem we saw earlier with Lighttpd (Bug #1017) was also an issue in nginx (couldn't find bug or bug tracker at all). Neither of them seems to support the Expect: 100-continue header. While no browser actually sends these headers, we have webservices running which are directly accessed by other types of HTTP clients. Losing support for this HTTP functionality would instantly break their applications, which is unacceptable.
So now we're actually looking at Squid for performing that task. Squid is powerful and well tested. We're going to start load testing this reasonably soon, and I have no problems reporting back here if people are interested in numbers. I'm wondering if there's other people who have tried a similar setup or if there's better ways to approach this problem.
Lighttpd + PHP fastcgi woes
In trying to get more out of our webservers using a Lighttpd and PHP-FastCGI setup, I've come across some major issues that make it difficult to use. I hope this post will warn people of some of the bugs they might encounter and workaround that might need to be implemented until some of these are fixed.
First off, the parent PHP-CGI process spawns n number of children, depending on your PHP_FCGI_CHILDREN. However, if your webserver (lighttpd) is stopped, or restarted, the parent process does not kill its children and they all get orphaned.
The only way to get around this easily is by making sure that as soon as you need to stop or restart your webserver, you do a 'killall php-cgi' while the server is still down. There's a PHP bugreport open, which seems to indicate the issue also happens in Apache. Vote for it!
The second, more severe issue is that when you hit maximum capacity for your PHP backend, lighty will start serving HTTP 500 errors for all PHP requests, and does not seem to stop doing this until the webserver is restarted altogether. Although not completely sure, these bugs seem to be the relevant ones:
So yea, based on this information it turns out that there's a clear need for some smart process killing/webserver restarting scripts if you'd like to switch to lighttpd in a high load environment. I got pretty scared trying this after finding these bugs. Makes me think no one really tried it out under heavy loads, which leads me to some questions.. Hopefully some readers of this feed have some experience here.
- Are you using lighttpd or an other alternative 'light' webserver using PHP under high load environments? Have you experienced similar issues?
- What are good ways around PHP's FCGI buggy behaviour (the buggy part is that PHP's parent process should return FCGI_OVERLOADED instead of timing out.) Should FCGI be avoided altogether at this point?
- What is a good way to come up with settings for 'max-procs' and 'PHP_FCGI_CHILDREN'. Reading other people's comment on this on the web people are all across the board, ranging from 1 for max-procs and 200 for PHP_FCGI_CHILDREN, to the exact opposite. Supposedly APC is isolated to 1 group of processes, so getting at least bigger groups of processes is important.
- And most importantly, whats your moms favourite color?
Mime types.. when will people learn?
<rant>
HTTP has an incredible useful feature to supply the Content-Type HTTP header for any url. This allows HTTP clients to easily figure out what type of data they're getting.
Over and over again I see clients, not doing this and making assumptions based on the url. The extension of all things! This is some artifact inherited from ms-dos, and passed on to different operation systems when GUI's became popular.
Two clear examples I have today (and I'm sure many people will have examples like this)
- Flash's VideoPlayer component. If there's no extension in the url, it will assume its some kind of xml file.
- iTunes podcasts.. Files have to end with a known extension for iTunes to pick it up as a video or audio file. Even though the Mime type has to be specified in both the RSS feed and the HTTP Header!
WTF?
</rant>
Firefox gets httpOnly cookies
httpOnly cookies allow you to hide your (session-)cookies from javascript. In the case of an XSS hole in your application, it will make a hackers life much harder to steal someones session.
Internet Explorer support httpOnly cookies for a long time, but since version 2.0.0.5 Firefox also supports this feature. Apparently Mozilla hasn't openly promoted this new feature yet, because its still possible to fetch the cookies with XMLHttpRequest. PHP has support for httpOnly cookies and sessions since 5.2.
On HttpOnly, Firefox-specific XSS and this years major Livejournal XSS attack
Yep, thats a long title, but they are all related to each other in some way. In the first few paragraphs I will explain what cookies are and XSS. You might want to skip ahead if you already know what this is.
Sessions, Cookies
HTTP is stateless. This means that every request to the server is a 'new' one and normally there is no relation to a first or second request. To allow maintaining a session or 'state' between multiple requests, HTTP cookies are used.
A cookie is basically a HTTP header with a tiny piece of information that gets re-sent with every request to the server. A popular way to make use of this is through PHP's$_SESSION system. This sends a cookie with a unique id to the client that allows PHP to retain a users' information across pages.
XSS
If you allow users to for example comment on one of your pages and allow (certain) html, it is sometimes possible to inject a piece of javascript. There are many tricks to evade the so-called html sanitizers.. strip_tags() is PHP's built-in sanitizer, but it doesn't work really well.. if, for example, you would allow users to use a <p> tag, which might seem harmless, there would be tricks to abuse the style="" or onclick="" attributes, just to name a few.
XSS and cookies
So how do you abuse javascript and cookies combined?
Because with for example PHP's session system, you can use the contents of the cookie to steal someone's session. The hacker would be logged in as you and might able to change your password and log you out afterwards. The contents of the cookies is stored in the javascript variable document.cookie.
HttpOnly, a solution
Microsoft came up with a way to prevent this from happening, ever since Internet Explorer 6.0 (starting from Windows XP SP1). They added an extra piece of information to a cookie, that will still allow the use of cookies in the way you are used to, but it will prevent the cookie from being read by javascript (basically it is invisible for javascript). Be sure to check out microsofts spec at MSDN
Safari and Opera quickly started supporting this. Because of this it is becoming pretty useful to use in practice. Remember that this doesn't mean you can just accept any html on your site, you should still always sanitize the bad stuff or not allow it at all! But in the case you missed something, it can make it a lot more difficult for your attacker to steal sessions.
UPDATE: Safari/Opera actually ignore it, my excuses, I didn't check my sources.
Under the hood
Normally, a cookie header will look like this:
Set-Cookie: USER=username; expires=Wednesday, 09-Nov-99 23:12:40 GMT;
But with the HttpOnly, it will look like this:
Set-Cookie: USER=username; expires=Wednesday, 09-Nov-99 23:12:40 GMT; HttpOnly
.
It's a small change, and all normal browsers should still accept this even if they don't understand the HttpOnly part. There is an exception though, and it goes by the name of IE 5 for mac. This browser won't understand the cookie and totally ignores it. Personally I don't support this browser for any application anymore, as there are too many bugs in this browser. But if your boss wants it, this might prevent you from using HttpOnly
PHP support
A guy named Scott MacVicar created a patch for PHP that will add an extra parameter to set_cookie() to enable HttpOnly for your cookies. The patch should also enable this by default for the session system.
If the patch will get accepted we will likely be able to use this in PHP 6.0 and perhaps even PHP 5.2, I'm looking forward to that. There is a chance though, because of the IE5/mac breakage that it eventually won't be auto-enabled for the session system.
So what about firefox?
Firefox doesn't support it, there is currently a bug open for it (Bug #178993). There was an initial solution posted over 2 years ago (January 2004). And a few other patches later on, but the mozilla folks refused all of them because they want to maintain the exact format of cookies.txt (the file they use to store cookies), because other applications might rely on that format. A few solutions for that have been posted, but it doesn't seem like a high priority for them.
A workaround for firefox (kind of)
There have been solutions for firefox that also blocked reading of cookies by javascript. Firefox has a magic function __defineGetter__ that can block reading of variables. To do this for all cookies on your site, include the following snippet on top of your html page:
HTMLDocument.prototype.__defineGetter__("cookie",function (){return null;});
However, you can't rely on this! There are still ways to get this cookie if the hacker can somehow create an iframe in your html page. The hacker has a reference to the same cookies if he uses the data: protocol in the src="" attribute. This will still make it a bit harder to steal cookies, so it's not a bad idea to implement. For a longer explanation of this workaround, check out http://www.wisec.it/sectou.php?lang=en.
The LiveJournal case
The same people who submitted the initial firefox-patch (see above) 2 years ago, also got hit by the attack in January 2006. Over 46% accounts were hijacked. This were over 900.000 stolen accounts. (check out their post about their solution.)
The reason they got hit by this is because they allowed users to use remote CSS stylesheets for their pages. CSS used to be only a specification for how html elements would look like on a page, but since a few years it has become more dynamic and now there are ways to exploit CSS with XSS attacks through for example IE's non-standard behavior: attribute and Firefox' -moz-binding: attribute (there are more you can exploit, but its outside the scope of the article). These attributes allow an author to create a custom behavior for a HTML element. The technique to do this for Firefox is called XBL.
Normally, when a page is loaded from domain A, and another (in a different frame for example) is loaded from domain B. Malicious scripts from domain B can never access cookies from domain A. This is called a 'same origin check'. This generally works in all browsers for all kinds of content, but an exception is XBL. (see bug #324253.)
Apparently this requires some major changes in how firefox works. The last comment in this bug is from February this year (2006). I hope they will wake up some day soon and make our life a little bit easier by both supporting HttpOnly and securing XBL.
So the LiveJournal attack could happen because the hackers used XBL in their CSS, which in turn accessed the HTTP cookies through javascript. The attack didn't affect users of other browsers, because: A) they support HttpOnly, B) even if they wouldn't.. the microsoft equivalent of XBL, which is .HTC files, don't work across different domains.







