Dynamic and static HTTP compression with Apache httpd

on 6 Sep 2006 by Mukund (@muks)

It appears that Planet GNOME isn't using any HTTP data compression when transferring the data from server to client.

Date: Wed, 06 Sep 2006 21:06:34 GMT
Server: Apache/2.0.46 (Red Hat)
Last-Modified: Wed, 06 Sep 2006 20:57:21 GMT
Etag: "7002a7-293e2-38a52e40"
Accept-Ranges: bytes
Content-Length: 168930
Content-Type: text/html; charset=utf-8

200 OK

Having downloaded the index page using curl, it turned out to be 168930 bytes which is quite large for a webpage (and it does take a slightly long time to download). Running gzip on it with the default compression level of 3 reduced the page size to 48113 bytes. Assuming the front page gets about 10000 requests in a day, if Planet GNOME used gzip/DEFLATE compression, it would save (168930 - 44305) * 10000 = 1.16 GB of bandwidth per day just for the frontpage alone, and would also make download times much faster than they are now for clients (users).

So here's a quick tip on how to go about saving bandwidth for your websites. Almost all modern and popular browsers including Firefox, MSIE, Konqueror and Safari support HTTP compression. There are two ways of achieving HTTP compression using Apache httpd, based on whether you are serving static pages or dynamic pages. It appears that the Planet software creates static webpages which are served by Apache httpd.

Regardless of whether you are serving dynamic or static content, the quickest way to achieve compressed pages with Apache httpd (version >= 2.0) is to use the following configuration in your httpd.conf:

LoadModule deflate_module modules/mod_deflate.so
AddOutputFilterByType DEFLATE text/html text/css text/plain text/xml
DeflateCompressionLevel 3

The LoadModule loads the Apache httpd module called mod_deflate which provides the compression functionality. It handles clients which support compression and clients which don't (by delivering uncompressed data). The AddOutputFilterByType adds a filter called DEFLATE to compress the data of the specified content types before they're sent out to the client. You may want to add more types there if you want them compressed, such as application/rss+xml, etc. You may also want to remove some content types such as text/css as Mozilla browsers seem to have issues with caching of CSS for HTTPS pages. Sometimes the pages are rendered without style—it may have been fixed by now. The DeflateCompressionLevel explains itself. Level of 3 gives good compression results which uses lazy matching and is pretty fast. You don't want to use anything over level 6 as it burns comparitively a lot more CPU cycles for very little extra compression.

Now let's get to the other alternative: when you have static content served by Apache httpd, you don't need to compress the data on every request. You can compress up-front just once when you create the static content, and have Apache httpd just serve the compressed data when the client supports compression. This is possible using a feature called content negotiation. Here's how you use it: first create your original static content file. Let's call it index.html. Now, place a compressed version of it in index.html.compressed:

gzip -9c index.html > index.html.compressed

You can use level 9 here as it's a one-time thing. It's best you don't use index.html.gz with the .gz extension as it is likely already set by default as of content type application/x-gzip because of its .gz suffix. Now, create a file called index.html.var in the same directory with the following contents:

URI: index.html
Content-Type: text/html; charset=UTF-8

URI: index.html.compressed
Content-Type: text/html; charset=UTF-8
Content-Encoding: gzip

The .var in the filename is just another extension. This file is called a type-map file. It is used to specify various alternatives for content negotiation, when the URL for index.html.var is accessed. The first URI section specifies which relative URI (file) to supply by default, and the second URI specifies which relative URI (file) to supply when the client can receive Content-Encoding of type gzip. So what this means is that when a URL location of the type:


is accessed, depending on whether the browser supports gzip compression or not, the contents of index.html.compressed or index.html are returned respectively. We're not done yet. Now, put the following into the .htaccess file in the directory containing these index.html* files:

DirectoryIndex index.html.var
AddHandler type-map .var
AddType text/html .compressed
AddEncoding x-gzip .compressed

The DirectoryIndex is self-explanatory. If URLs with index.html or index.html.compressed are accessed directly, their raw contents are returned. The AddHandler directive identifies files with the extension .var to be type- map files as described above. The AddType and AddEncoding for files with the extension .compressed make them show up as compressed HTML content to the browser. Now, remember to make your links point to the index.html.var file, instead of the index.html file. That's it. Simple, no?

The naming conventions of filenames for use in the .var file can be a bit confusing, so tread carefully if you wish to experiment with the filename extensions.

Okay, now that you know how to compress textual content, let's turn to optimizing images quickly. Use pngcrush on all your PNG images to optimize them. The default options should be okay. In the same way, use jpegtran (a part of the libjpeg software) with the options -optimize -progressive to losslessly optimize JPEG images. -optimize enables the use of dynamic Huffman tables. (Using arithmetic coding for the entropy coder would make images even more compact, but it is patent ridden and isn't widely used or supported.) -progressive makes many JPEG images smaller, and also is good for use on the web with large images as it lets them display progressively in the browser.

So in conclusion, try these tips on your websites, and see how fast they load in browsers.