't Bijstere spoor

't Bijstere spoor

A blog about Web development

Subversion 1.5 for debian Etch

The Subversion team released version 1.5 recently, with some really tight features such as changelist and merge tracking.

In our shop we standardized on Debian, and it's very likely going to take till 5.0 (lenny) until we get access. Normally I would just do a compile from source, but since this will have to be done on multiple servers, I decided to backport subversion from Lenny to create a nice little .deb package.

Here are my steps, (tested on just my machine, so please try at your own risk).

  1. We'll start off by creating a directory for this process.
    mkdir subversion
  2. Next, download the source packages from the debian packages site.. They're on the right.
  3. Now, unzip them.
    gunzip subversion_1.5.0dfsg1-4.diff.gz
    tar xfvz subversion_1.5.0dfsg1.orig.tar.gz
  4. The first file was patch with debian specific changes, we'll need to apply this patch to the source tree.
    patch -p0 < subversion_1.5.0dfsg1-4.diff
  5. Enter the source directory
    cd subversion-1.5.0dfsg1/
  6. We'll need to change some file permissions to make this work.
    chmod 755 debian/rules
  7. Now we switch to the root user, because we need to install some dependencies.
    # Assuming you're root
    apt-get build-dep subversion 
    apt-get install python-all-dev libneon26-dev quilt libsasl2-dev fakeroot debhelper
  8. Switch back from root to your normal user.
  9. I've had some issues making creating the java hooks. To get around this, we'll need to disable them. Do this by opening the 'debian/rules' file and change the line that says 'ENABLE_JAVAHL' to no. For me this was on line 21.
    ENABLE_JAVAHL := no
  10. Now we can get started building the package.
    dpkg-buildpackage -rfakeroot -uc -b -d
  11. Compiling!
  12. If everything went well, you should have ended up with about 10 .deb packages in the directory right above the source directory. You can simply install all of them with:
    dpkg -i *.deb

Preventing XSS in Javascript strings

Escaping user-input in your HTML is essential for preventing worlds #1 vulnerability.

When you're embedding user input into javascript, a simple htmlspecialchars won't cut it, you'll need to make sure you're escaping other things, like \n (line endings), and \ (slashes). Google doctype has a good list of characters in need of proper escaping to prevent users breaking your javascript.

However, when I dropped the question if a simple string replacement would be good enough, the members of the Web security mailing list gave me a different answer.

When escaping or filtering output using a blacklist (such as the one published on google doctype) browser/unicode escaping bugs are not taking into consideration. Some new vulnerability might appear in the future, which would immediately open a hole in your app. For this reason its wiser to go with a much more defensive white-list approach, essentially only letting things through you know is safe.

Introducing Reform

Reform is a tool that does exactly this. Reform allows you to escape your data for a javascript, xml, html or vbscript (yes it still exists) context. It provides libraries for Java, .NET, PHP, Perl, Python, Javascript and ASP. Pretty cool!

One dislike I have is that it only considers I really small set of unicode codepoints safe, especially when dealing with non-latin languages this is going to add a great deal to the bandwidth usage and the legibility of your sourcecode. One would think there has to be more ranges considered 'safe'.

PHP example:

<?php
  
// Assuming the Reform class is included..

  
echo '<script type="text/javascript"> var myString = 'Reform::JsString($userInput), '; </script>';

?>

I made a couple of changes in the PHP version, specifically:

  • Prepended the 'static' keyword to every method to make it work in PHP5's strict mode.
  • Removed the UTF-8 checks, I'm in a controlled environment, mbstring is installed, and the internal encoding is utf-8.
  • Added a parameter to Reform::JsString to not automatically put the string between quotes (').

Converting line-endings with ViM

I got my hands on a file containing OS/X line-endings (\r), which needed to be converted into Unix line-endings (\n).

Normally I would just do a simple search and replace, with:

:%s/\r/\n/g

Oddly enough, this actually gave me null-characters (0x00) instead of the expected \n. After some browsing, this seems to be the correct command:

:%s/\r/\r/g

I have no clue why, and this seems wrong to me, as this command should not have any effect (replacing \r with \r). But yea, it worked :S


Google and Yahoo start indexing SWF's

Via: Theo Hultberg.

An odd story caught my attention recently, and I've been meaning to put my thoughts down.

I'm often asked the question about indexing flash content recently, and the recent announcement by google only increased the stir.

It's odd, because even Adobe employees seem completely clueless about what it means, and the implications.

Quote from Ryan Steward:

So what does that mean? We are giving a special, search-engine optimized Flash Player to Yahoo and Google which is going to help them crawl through every bit of your SWF file. This Flash Player will act just like a person would in some cases. It will click on your buttons, it will move through the states of your application, get data from the server when your application normally would, and it will capture all of the text and data that you’ve got inside of your Flash-based application. We’ve basically provided a very powerful looking glass into SWF files so Google and Yahoo can pull out meaningful information.

Is in sharp contrast with what google is saying:

We currently do not attach content from external resources that are loaded by your Flash files. If your Flash file loads an HTML file, an XML file, another SWF file, etc., Google will separately index that resource, but it will not yet be considered to be part of the content in your Flash file.

So essentially, google will index your SWF, but not the actual content it loads. Most modern Flash Apps don't hardcode any textual content these days, and will likely load most of their data from the servers. Most importantly, I feel the SWF should not be indexed at all. SWF is middleware, it is responsible for delivering content to the user, it (should not) be the actual content itself, for any serious web application.

One more gem from the Google blogposting:

That said, you should be aware that Google is now able to see the text that appears to visitors of your website. If you prefer Google to ignore your less informative content, such as a "copyright" or "loading" message, consider replacing the text within an image, which will make it effectively invisible to us.

So where is this coming from?

My guess is one argument between picking HTML vs. Flash to deliver your content, it could be said that a Flash is not SEO-friendly. Getting this message out allows pro-flash people to fight back a little. It definitely feels that this whole announcement has little to do with the technology, but much more with putting the Flash-brand in a better light.

How would one actually make SWF's SEO-friendly?

Just don't. Make sure the content is available on the web in an alternative format. Often your flash content is stored in a database (or an XML file for smaller sites). Pick up your favorite server-side scripting language, and make sure the content is also available in an indexable format. Using fancy CSS and Javascript usage you can make sure the content is replaced by the Flash content when a regular user visits.

If you do this, all the normal SEO rules are applied. As a side effect, the user also benefits from this as your content degrades nicely for older or for example mobile browsers, people with disabilities and you name it. The sole reason for this is that search engines are actually try to find 'quality content' based on your search query.

Last but not least, XHTML is a form of XML. If you use XHTML as a datasource for your content, search engines can also access it directly.


IE8 comprehensive protection

Today on the IE blog a big announcement was made regarding the upcoming security features in Internet Explorer 8.

Definitely check it out! Among things it includes an XSS protection filter, HTML sanitizing built straight into the scripting engine and a way to disable the infamous 'content sniffing'. I'd still hope to see the content-sniffing 'feature' to be opt-in, instead of the proposed opt-out solution.. but hey, at least it allows us to plug the hole.

To serve files as text/plain, serve the document with the Content-Type header as:

Content-Type: text/plain; authoritative=true;

I have to say, I'm quite impressed how IE is catching up with things like standards and security.