New HTTP status codes

RFC 6585 has been published quite recently. This document describes 4 new HTTP status codes.

So in case you were wondering, yes.. HTTP is still evolving :), and these new statuses may be quite useful for developing your REST, or otherwise HTTP-based service. This post describes why they are important, and when you should use them.

428 Precondition Required

A precondition is something a client can send along with a HTTP request. This condition needs to be met in order for the request to complete.

A good example is the If-None-Match header, which is often used along with GET requests. If the If-None-Match is specified, the client can request to only receive the response if the ETag changed.

Another example of a precondition, is the similar 'If-Match' header. An If-Match header is usually sent along with PUT requests to indicate to only update the resource if it hasn't changed. This is useful if multiple clients are using a HTTP-based service, and they want to make sure they are not overwriting each others contents.

Using the 428 Precondition Required status, the server can now indicate that the client *must* send along one of those headers to perform the request. This is effectively a way for the server to force clients to prevent this 'lost update' problem.

429 Too Many Requests

This status code is useful in cases where you want to limit the amount of requests a client may want to do on your API (also known as rate limiting).

In the past different status codes have been used, such as '509 Bandwidth Limit Exceeded'. Twitter uses 420 for some stuff (which is an unused status code). Thus it was important enough to give it it's own code.

So if you limit the number of requests clients may do on your server, 429 Too Many Requests is the way to go. Include a 'Retry-After' response header to indicate to a client when they are allowed to make requests again.

431 Request Header Fields Too Large

I was surprised to see that this was a common enough usecase to warrant it's own status code, but here it is!

In a case where a client is sending a HTTP request header that's too big, the server can respond with 431 Request Header Fields Too Large to indicate exactly that.

I have no idea why they skipped over 430 though. I tried to search around, but couldn't quite find the reasoning. My best guess is that a lot of people may have mistyped '403 Forbidden' as '430 Forbidden', and they wanted to avoid complications. If you know, let me know!

511 Network Authentication Required

This status code is very interesting to me. You will not have to deal with this if you're writing a server, but it can be important if you're writing a (desktop) HTTP client.

If you move around with your laptop or smartphone a lot, you may have noticed that a lot of public wifi services now require you to accept their license agreement, or just log in before the web works.

This is generally done by intercepting the HTTP traffic and presenting a redirect and login when the user tries to access the web. Quite nasty, but that's the way it is.

Using these 'intercepting' clients can have some nasty side effects. There are two great examples mentioned in the RFC to illustrate this.

  • If you hit a website before logging in, the network device intercepts the first request. These devices also tend to have a 'favicon.ico' stored. After logging in, you'll notice that the favicon is now cached for the website you tried to visit, and it may follow you around for quite some time.
  • If a client uses HTTP requests to find documents, the 'network' may respond with a login page, instead of the json or other document you expected. Your client may (in error) assume it's a 'normal' response and use that instead. This can put clients in a broken or irrecoverable state. I've noticed this in real life a few times working on a CalDAV system as well.

So to fix this 511 Network Authentication Required is introduced.

So if you write an application that runs on an desktop or phone and use HTTP, you should ideally check for this HTTP response code. In a way, it simply means that a network is not yet available and you should pretty much ignore anything coming back until it is. You could even provide the user with the returned login page, like iOS and OS X 10.7 do.

What happened to HTTP authentication?

Rant warning

We enter our usernames and password on pretty much all the sites we commonly visit. Authentication is probably one of the first things you're being taught when starting to work with PHP. For some reason, in 99% of the cases this is done through an HTML form, with the username and password submitted as a urlencoded string.

You probably know that HTTP also has native authentication, in the form of Basic and Digest authentication (read my older article if you want to know how). Every browser and pretty much any HTTP client does too. There's some big benefits to that, because it provides a very standardized mechanism to authenticate a client, whether you're a machine or human.

What baffles me is that HTTP authentication hasn't been developed further. HTTP Digest is pretty secure by itself, and has some nice features (hashed password, protection against man in the middle and replay attacks, message digests) which is way more advanced than an HTML POST form with a session cookie can provide.

What's missing?

  1. There's no way for a user to see if they are authenticated to a site. Perhaps a username in the addressbar?
  2. Pretty much everybody always wonders how they can code a logout mechanism. Because there are no session cookies that can be destroyed, there are some hacks that trick the browser to ask for credentials again. There should be no need for the server to provide this functionality. The browser knows it's logged in, and HTTP applications are stateless. We need an in-browser log-out button.
  3. Less important, some javascript hooks that allow developers to still use html forms to setup HTTP authentication.

Mozilla is doing some interesting things with their Account Manager Add-on for firefox, but even that add-on does not support HTTP authentication. With Account Manager they are jumping through some hoops with javascript hooks so it works with regular authentication systems, but you'd think that if HTTP Authentication was used, things could be a lot more straightforward. The browser knows exactly who is logged in.

So, does anyone know how this happened? Is there a major flaw in HTTP authentication I'm just missing?

OS/X WebDAV and Chunked Transfer Encoding

While OS/X's WebDAV implementation is quite slow, it is mostly pretty decent. The client uses the little used Chunked transfer encoding for PUT requests, which allows it to send big files without knowing exactly how big the file is going to be. A request like this looks like this:

  1. PUT /image.png HTTP/1.1
  2. Host: example.org
  3. User-Agent: WebDAVFS/1.8 (01808000) Darwin/10.2.0 (i386)
  4. Accept: */*
  5. X-Expected-Entity-Length: 10316
  6. If: (<opaquelocktoken:44445502-c253-02e6-7198-45b36c96e8c7>)
  7. Connection: close
  8. Transfer-Encoding: Chunked

While this is a perfectly legal HTTP request, webservers choke on it. Both Nginx and Lighttpd respond with HTTP 411 Length Required. This would have been valid for HTTP/1.0 servers, but if they claim to support HTTP/1.1 they must accept these requests.

Apache + mod_php does this fine, but just recently I got a report from someone using Apache + fastcgi + php. In this case the request body never arrived in PHP which can unfortunately result in silent data loss.

So I guess that's a bit of a warning, so far OS/X WebDAV only plays nicely with Apache + mod_php servers.

WebDAV-related RFC's

In an attempt to better understand the WebDAV standards space, I made up a non-scientific graph of all the specs and dependencies. I'd like to get started with CalDAV, but I have a few other specs to implement before I'll be able to do that.

WebDAV-related rfc's

The next one for me on the list is ACL. Attempting to integrate these new features within the existing system so far has proven to be very challenging. The big reason is my (perhaps high) requirements on how this is supposed to work:

  • It shouldn't touch the existing WebDAV system (at all), because 99% of the users will not use ACL.
  • The interface & implementation should still be understandable if you are implementing ACL.
  • I like the existing WebDAV class structure as it stands, so if I have to make changes in the design; it should still be easy to grasp.

HTTP Basic and Digest authentication with PHP

HTTP authentication is quite popular for web applications. It is pretty easy to implement and works for a range of http applications; not to mention your browser.

Basic Auth

The two main authentication schemes are 'basic' and 'digest'. Basic is pretty easy to implement and appears to be the most common:

  1. <?php
  2.  
  3. $username = null;
  4. $password = null;
  5.  
  6. // mod_php
  7. if (isset($_SERVER['PHP_AUTH_USER'])) {
  8. $username = $_SERVER['PHP_AUTH_USER'];
  9. $password = $_SERVER['PHP_AUTH_PW'];
  10.  
  11. // most other servers
  12. } elseif (isset($_SERVER['HTTP_AUTHENTICATION'])) {
  13.  
  14. if (strpos(strtolower($_SERVER['HTTP_AUTHENTICATION']),'basic')===0)
  15. list($username,$password) = explode(':',base64_decode(substr($_SERVER['HTTP_AUTHORIZATION'], 6)));
  16.  
  17. }
  18.  
  19. if (is_null($username)) {
  20.  
  21. header('WWW-Authenticate: Basic realm="My Realm"');
  22. header('HTTP/1.0 401 Unauthorized');
  23. echo 'Text to send if user hits Cancel button';
  24.  
  25. die();
  26.  
  27. } else {
  28. echo "<p>Hello {$username}.</p>";
  29. echo "<p>You entered {$password} as your password.</p>";
  30. }
  31.  
  32. ?>

Well it's a bit difficult I suppose, but you might have noticed the username and password are sent over the wire using base64 encoding. Not really secure, unless you have SSL in place.

Digest

Digest is designed to be more secure. The password is never sent over the wire in plain text, but rather as a hash. The implications of the usage of a hash is that it can never be decrypted. We can only validate the hash by applying the same hash function to the password we have. If the hashes match, the password was correct.

Lets first see how Digest auth should work:

Client requests url

  1. GET / HTTP/1.1

Server requires authentication

  1. HTTP/1.1 401 Unauthorized
  2. WWW-Authenticate: Digest realm="The batcave",
  3. qop="auth",
  4. nonce="4993927ba6279",
  5. opaque="d8ea7aa61a1693024c4cc3a516f49b3c"

Client authenticates

  1. GET / HTTP/1.1
  2. Authorization: Digest username="admin",
  3. realm="The batcave",
  4. nonce=49938e61ccaa4,
  5. uri="/",
  6. response="98ccab4542f284c00a79b5957baaff23",
  7. opaque="d8ea7aa61a1693024c4cc3a516f49b3c",
  8. qop=auth, nc=00000001,
  9. cnonce="8d1b34edb475994b"

Information coming from the server:

realmA string which will be used within the UI and as part of the hash.
qopCan be auth and auth-int and has influence on how the hash is created. We use auth.
nonceA unique code, which will be used within the hash and needs to be sent back by the client.
opaqueThis can be treated as a session id. If this changes the browser will deauthenticate the user.

Information from the client:

usernameThe supplied username
realmSame as server response.
nonceSame as server response.
uriThe authentication uri
responseThe validation hash.
opaqueSame as server response.
qopSame as server response.
ncNonce-count. This a hexadecimal serial number for the request. The client should increase this number by one for every request.
cnonceA unique id generated by the client

So how do we know if the password was correct? We van validate using the following formula (pseudo code).

  1. A1 = md5(username:realm:password)
  2. A2 = md5(request-method:uri) // request method = GET, POST, etc.
  3. Hash = md5(A1:nonce:nc:cnonce:qop:A2)
  4.  
  5. if (Hash == response)
  6. //success!
  7. else
  8. //failure!

Or, using PHP:

  1. <?php
  2.  
  3. $realm = 'The batcave';
  4.  
  5. // Just a random id
  6. $nonce = uniqid();
  7.  
  8. // Get the digest from the http header
  9. $digest = getDigest();
  10.  
  11. // If there was no digest, show login
  12. if (is_null($digest)) requireLogin($realm,$nonce);
  13.  
  14. $digestParts = digestParse($digest);
  15.  
  16. $validUser = 'admin';
  17. $validPass = '1234';
  18.  
  19. // Based on all the info we gathered we can figure out what the response should be
  20. $A1 = md5("{$validUser}:{$realm}:{$validPass}");
  21. $A2 = md5("{$_SERVER['REQUEST_METHOD']}:{$digestParts['uri']}");
  22.  
  23. $validResponse = md5("{$A1}:{$digestParts['nonce']}:{$digestParts['nc']}:{$digestParts['cnonce']}:{$digestParts['qop']}:{$A2}");
  24.  
  25. if ($digestParts['response']!=$validResponse) requireLogin($realm,$nonce);
  26.  
  27. // We're in!
  28. echo 'Well done sir, you made it all the way through the login!';
  29.  
  30. // This function returns the digest string
  31. function getDigest() {
  32.  
  33. // mod_php
  34. if (isset($_SERVER['PHP_AUTH_DIGEST'])) {
  35. $digest = $_SERVER['PHP_AUTH_DIGEST'];
  36. // most other servers
  37. } elseif (isset($_SERVER['HTTP_AUTHENTICATION'])) {
  38.  
  39. if (strpos(strtolower($_SERVER['HTTP_AUTHENTICATION']),'digest')===0)
  40. $digest = substr($_SERVER['HTTP_AUTHORIZATION'], 7);
  41. }
  42.  
  43. return $digest;
  44.  
  45. }
  46.  
  47. // This function forces a login prompt
  48. function requireLogin($realm,$nonce) {
  49. header('WWW-Authenticate: Digest realm="' . $realm . '",qop="auth",nonce="' . $nonce . '",opaque="' . md5($realm) . '"');
  50. header('HTTP/1.0 401 Unauthorized');
  51. echo 'Text to send if user hits Cancel button';
  52. die();
  53. }
  54.  
  55. // This function extracts the separate values from the digest string
  56. function digestParse($digest) {
  57. // protect against missing data
  58. $needed_parts = array('nonce'=>1, 'nc'=>1, 'cnonce'=>1, 'qop'=>1, 'username'=>1, 'uri'=>1, 'response'=>1);
  59. $data = array();
  60.  
  61. preg_match_all('@(\w+)=(?:(?:")([^"]+)"|([^\s,$]+))@', $digest, $matches, PREG_SET_ORDER);
  62.  
  63. foreach ($matches as $m) {
  64. $data[$m[1]] = $m[2] ? $m[2] : $m[3];
  65. unset($needed_parts[$m[1]]);
  66. }
  67.  
  68. return $needed_parts ? false : $data;
  69. }
  70.  
  71. ?>

As you can see we need to have a plain-text version of the password in order to validate the user. It's not a good idea to store the plain-text password, therefore it's strongly recommended to store the result of $A1 instead.

Security improvements

  • It's smart to validate the contents of opaque, nonce and realm. If you have the data stored on the server, why not check it.
  • The nc should be an ever increasing number. You could store the number and track to make sure it doesn't make any big jumps. It's not wanted to be extremely strict about the sequence, because you might miss a number, and requests could come in be out of order.
  • 'qop' is quality of protection. This serves as an integrity code for the request. A hacker could steal all your HTTP Digest headers and simply change the body to make it do something else. If 'qop' is set to 'auth', only the requested uri will be taken into consideration. If 'qop' is 'auth-int' the body of the request will also be used in the hash. (A2 = md5(request-method:uri:md5(request-body))).

References:

Apache speed and reverse proxies

In our environment we use Apache everywhere. It's PHP integration has so far proven superiour. Now we're dealing with higher loads and we've hit some limitations.

One of the problems we had, is Apache's heaviness. Our apache2 worker processes eat up around 20 Megabytes of memory, and with 3 GB of memory will bring us up to a setting of around 150 MaxClients. Rasmus seems to think that's a pretty high setting, but based off the easy calculation (memory available for apache / size of an apache process) it works out for us.

Effectively this means we can serve approximately this much parallel request on this machine. It is therefore in our greatest benefit to get every response out as quickly as possible, increasing the amount of requests we can handle per second.

Going beyond this 150 number could cause Linux to start using swap. This is bad, because it will add latency to the response, which in turn will result connections staying open longer.

Since we're sending everything over the web, there is a standard latency. Information traveling to the other side of the globe will at least take 67ms because we're restricted to the speed of light. This doesn't even take non-direct routes nor other hardware latency into account. According to Till this all adds up to the time a single Apache process takes up before working on the next request.

The reverse proxy

There are a couple of webservers which seem to be optimized for serving lots of clients. Lighttpd got a lot of traction earlier, but the project seems to have slowed down a lot as the much anticipated 1.5 release has been under development for almost 2 years. nginx seems have taken it's place in terms of disruptiveness. These servers are much more lightweight, and are supposed to be faster in delivery of static files.

Much like Till, we've had issues hooking PHP directly into these servers. Till suggests the solution of actually placing nginx in front of Apache (on the same machine) as a reverse proxy. Nginx takes care of serving static files and proxies any PHP request to Apache. The concept is that Apache can push out the response as quickly as possible, and while Nginx is working on delivering it to the (slow) client Apache can take on other work.

The thing that bothers me with this setup, is that the need for 2 webserver products to achieve a single task. This implies that neither of them is adequate on it's own to do the job.

On the other hand, this type of setup is also what a lot of people seem to be doing by placing Squid in front of their webservers, although that tends to happen on separate hardware.

HTTP/1.1 100 Continue

All of a sudden we noticed a problem we saw earlier with Lighttpd (Bug #1017) was also an issue in nginx (couldn't find bug or bug tracker at all). Neither of them seems to support the Expect: 100-continue header. While no browser actually sends these headers, we have webservices running which are directly accessed by other types of HTTP clients. Losing support for this HTTP functionality would instantly break their applications, which is unacceptable.

So now we're actually looking at Squid for performing that task. Squid is powerful and well tested. We're going to start load testing this reasonably soon, and I have no problems reporting back here if people are interested in numbers. I'm wondering if there's other people who have tried a similar setup or if there's better ways to approach this problem.

Mime types.. when will people learn?

HTTP has an incredible useful feature to supply the Content-Type HTTP header for any url. This allows HTTP clients to easily figure out what type of data they're getting.

Over and over again I see clients, not doing this and making assumptions based on the url. The extension of all things! This is some artifact inherited from ms-dos, and passed on to different operation systems when GUI's became popular.

Two clear examples I have today (and I'm sure many people will have examples like this)

  • Flash's VideoPlayer component. If there's no extension in the url, it will assume its some kind of xml file.
  • iTunes podcasts.. Files have to end with a known extension for iTunes to pick it up as a video or audio file. Even though the Mime type has to be specified in both the RSS feed and the HTTP Header!

WTF?

 1

About

My name is Evert, and I've been writing semi-regularly on this blog since 2006.

I'm currently available for contract work.

more info.

Subscribe

Dropbox

Dropbox is a simple cross-platform online backup and sync application. The first 2GB of space is free, and both you and me get an extra 250MB extra space if you sign up through this link.