Archive for the 'performance' Category

Conditional comments block downloads

Sunday, May 23rd, 2010

I came across this blog post (via @pornelski and @souders) where Markus has stumbled upon a case where an IE6-only stylesheet included with a conditional comment blocks the downloads in IE8. Whaaat?

I had to dig in. To give you a summary: turned out that any conditional comment, not only for an extra CSS, will block further downloads until the main CSS file arrives. Also the solution offered on the blog post (using X-UA-Compatible) seems to be more of an error due to an accidentally left comment.

Check out the tests.

Base page

The first test is the base page. It follows a pretty common style pattern - CSS at the top, a bunch of images in the middle, JS at the bottom.

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">

<html>
<head>
    <title>The base page</title>
    <link type="text/css" rel="stylesheet"
          href="http://tools.w3clubs.com/pagr/1.expires.css">
</head>
<body>
<p>
    <img src="http://tools.w3clubs.com/pagr/1.expires.png" alt="1">
    <img src="http://tools.w3clubs.com/pagr/2.expires.png" alt="2">
    <img src="http://tools.w3clubs.com/pagr/3.expires.png" alt="3">
    <img src="http://tools.w3clubs.com/pagr/4.expires.png" alt="4">
</p>
<script type="text/javascript"
        src="http://tools.w3clubs.com/pagr/1.expires.js"></script>
</body>
</html>

The waterfall produced by WebPageTest looks like so:

base page

The page is here, the results of the WebPageTest's test are here.

Conditional IE6 stylesheet

Adding a second stylsheet to the head like so:

<head>
  <title>base page</title>
  <link type="text/css" rel="stylesheet"
        href="http://tools.w3clubs.com/pagr/1.expires.css">
  <!--[if IE 6 ]>
    <link type="text/css" rel="stylesheet"
          href="http://tools.w3clubs.com/pagr/2.expires.css">
  <![endif]-->
</head>

Turns out that this conditional comment blocks further downloads until the main CSS arrives.

Test page, test results, waterfall:

CC style page

Just like that the total page to onload time went up from 1 second to almost 1.3 seconds. Ouch.

And this is because of an IE6 stylesheet, which IE8 has no use for. My wild guess is that IE needs to parse through those conditional comments and treats them sort of like inline script. And we know that inline scripts following a stylesheet tend to block.

Conditional markup

What if we don't include IE6 specific stylsheet, but use the conditional comments to write different body tags with different class names, as described by Paul Irish.

<!--[if IE 6]> <body class="ie6"> <![endif]-->
<!--[if !IE]><!--> <body> <!--<![endif]-->

Turns out that these markup conditional comments will also block downloads until the CSS arrives.

Test page, test results, waterfall looks as bad (the same blocking).

Browser-sniffing comments

I blogged about this a few days ago - using conditional comments to do the browser sniffing and include appropriate CSS - one for normal browsers and a complete alternative for IE6,7.

<head>
  <!--[if lte IE 7]>
    <link type="text/css" rel="stylesheet"
          href="http://tools.w3clubs.com/pagr/2.expires.css">
  <![endif]-->
  <!--[if gt IE 7]><!-->
    <link type="text/css" rel="stylesheet"
          href="http://tools.w3clubs.com/pagr/1.expires.css">
  <!--<![endif]-->
</head>

Turns out this is OK to do. The conditional comments are processed before the download is initiated, so there't nothing to block on after the stylesheet. Yeey!

Test page, test results, waterfall looks like the first one for the base page.

X-UA-Compatible not a solution

The blog post suggested using the X-UA-Compatible meta tag to say that the UA is the latest IE possible.

<meta http-equiv="X-UA-Compatible" content="IE=edge">

It didn't work for me.

Test page, test results, waterfall like the second (the blocking) one.

Looking closely and thanks to the screenshots digging out the original test page I noticed that it contains a dangling comment. Putting that dangling comment in a test page, and the blocking effect was gone. But this is a bug. In fact the comment shows up on the page! My wild guess is that this improper comments invalidates the following one and that's why there's no blocking. I guess that IE6 will not load the stylesheet, but I didn't test.

So my tests - with X-UA meta tag (results) and with comment bug (results)

Conclusions, conclusions

To summarize, if you worry about performance, don't use conditional comments.

There might be an exception when you put a script in the head after the CSS - then there are two blocking things, so the effect of the conditional comment is not visible. But that's still bad for performance, there's still blocking.

It's OK to use the browsers-sniffing comments approach, provided of course, that there's no other CSS file before the sniff. If there is one, it will block.

Best - just use _ and * hacks, go with a single CSS and forget sniffing.

Update: empty conditional comment

Thanks to Markus who kept looking into the extra conditional comment comes a solution: an empty conditional comment early on solves the blocking issue.. By "early on" I mean before the main CSS which caused the blocking.

I did two more tests to validate the solution and it absolutely works.

One test has empty comment + conditional comment for writing Paul's body tags. The empty comment (aka the solution) is right before the blocking CSS file.

<head>
    <title>base page</title>
    <!--[if IE 6]><![endif]-->
    <link type="text/css" rel="stylesheet"
          href="http://tools.w3clubs.com/pagr/1.expires.css">
</head>

<!--[if IE 6]> <body class="ie6"> <![endif]-->
<!--[if !IE]><!--> <body> <!--<![endif]-->

Test page, webpagetest results. Works as advertised, no more blocking.

The second test uses empty comment and conditional stylesheet. In this case I even put the empty comment way at the top. Sort of like declaring upfront - hey this page uses conditional comments and the empty comment is the solution to the blocking effect.

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<!--[if IE 6]><![endif]-->

<html>
<head>
  <title>base page</title>
  <link type="text/css" rel="stylesheet"
        href="http://tools.w3clubs.com/pagr/1.expires.css">
  <!--[if IE 6 ]>
    <link type="text/css" rel="stylesheet"
          href="http://tools.w3clubs.com/pagr/2.expires.css">
  <![endif]-->
</head>

Test page, webpagetest results. No more blocking :)

Answering Andrea's question - what it if you have several conditionals - for IE6, IE7 and so on, I did one more test with two conditional CSS files. Turns out it's fine, as long as there's one conditional comment before the CSS. I actually updated the comment so it checks for all IE versions.

Test page, results, code:

<!--[if IE]><![endif]-->
<html lang="en">
<head>
    <title>base page</title>
    <link type="text/css" rel="stylesheet"
          href="http://tools.w3clubs.com/pagr/1.expires.css">
    <!--[if IE 6]>
        <link type="text/css" rel="stylesheet"
              href="http://tools.w3clubs.com/pagr/2.expires.css">
    <![endif]-->
    <!--[if IE 7]>
        <link type="text/css" rel="stylesheet"
              href="http://tools.w3clubs.com/pagr/3.expires.css">
    <![endif]-->
</head>

In conclusion #2... conditional comments cause CSS to block. The workaround it to have an extra empty comment before the blocking CSS, or, to be safe, right after the doctype. Even better - don't use conditional comments at all. Except for browser-sniffing and loading two complete and completely separate CSS files, not just the IE fixes.

 

YUI CSS min - part 3 - hacks

Friday, May 21st, 2010

The previous parts are here (building and testing) and here (what gets minified). Now let's see how YUI CSS min handles CSS hacks.

As you know CSS hacks often use errors in CSS parsers in browsers to target specific browser versions and supply additional rules to work around other issues in said browsers. That makes any CSS tool's job slightly more challenging. Not only does the tool have to avoid repeating the browsers errors, but also has to understand what browsers got wrong and support it too. Fun stuff. Isn't it a joy being a web developer?

So here are some hacks that are tested to work with the YUICopmpressor's CSS min.

Underscore/star hack

The simplest ever hack to target IE6 and IE7. In the example below normal browsers see 1px dropping _width and *width as invalid, IE7 ignores the *, drops the _width as invalid and sees 3pt, IE6 ignores the _ and sees _width as width, so it sees 2em.

CSS min doesn't parse and doesn't understand CSS properties, so it accepts pretty much any property.

Before:

#element {
    width: 1px;
    *width: 3pt;
    _width: 2em;
}

After:

#element{width:1px;*width:3pt;_width:2em}

Child selector hack

CSS min strips comments but there is this child selector hack people use to hide declarations from IE7 and below.

CSS min retains empty comments that immediately follow > (thanks go out to Chris Burroughs)

Before:

html >/**/ body p {
    color: blue;
}

After:

html>/**/body p{color:blue}

IE5/Mac hack

This hack targets IE5/Mac, if anyone still worries about this browser. The hack is retained after minification, only it's minified.

Before:

/* Ignore the next rule in IE mac \*/
.selector {
   color: khaki;
}
/* Stop ignoring in IE mac */

After:

/*\*/.selector{color:khaki}/**/

Box model hack

This hack uses valid CSS and there's no special use of comments so it's retained.

Before:

#elem {
    width: 100px; /* IE */
    voice-family: "\"}\"";
    voice-family:inherit;
    width: 200px; /* others */
}
html>body #elem {
    width: 200px; /* others */
}

After:

#elem{width:100px;voice-family:"\"}\"";voice-family:inherit;width:200px}html>body #elem{width:200px}

Seems like the code highlighter chokes here though. It ain't easy :)

That's all, folks!

Thanks and please, feel free to suggest improvements and report bugs. Also play with the web UI of the JS-version here to see for yourself what it does to your code.

 

YUI CSS Min - part 2

Thursday, May 20th, 2010

The first part is here. It was more about building the YUICompressor, writing and running test cases. Now let's see what the compressor does exactly to your CSS.

BTW, you can play with the web UI to see for yourself how the minifier works.

Stripping comments and white space

This is the bare minimum a minifier can do. And when it comes to CSS, this is also the place where ther biggest improvement comes from. In JS for example you can rename variables and save bytes, but in CSS the possibilities are more limited. No shorter way to say text-decoration, unfortunately.

So before:

/*****
  classmates stuff
*****/
.classmates {
    /* after 10 years */
    weight: considerable;
}

After:

.classmates{weight:considerable}

Special comments

Stripping comments is nice but not always ok. Sometimes you need to retain copyright information. Use ! at the beginning of the comment to mark the comment as special.

Before:

/*!
  (c) copyright copyleft
*/
.classmates {
    /* after 10 years */
    weight: considerable;
}

After:

/*!
  (c) copyright copyleft
*/.classmates{weight:considerable}

Thanks to the charmingly insisting Billy Hoffman and the valid case he presented, the bang (!) itself is preserved too. This way you can safely double minify. Also lint tools (such as Zoompf, YSlow and PageSpeed) can see the ! and conclude that this comment is there intentionally, not because you forgot to minify.

Striping last semi-colon

The last semi-colon in a declaration block is out. So keep it in your source for maintenance purposes and let the minifier take care of stripping it out.

Before:

a {
  one: 1;
  two: 2;
}

After:

a{one:1;two:2}

Extra semi-colons

One semi-colon is all you need, so the minifier will strip an accidentally added one.

Before:

p :link {
  ba: zinga;;;
  foo: bar;;;
}

After:

p :link{ba:zinga;foo:bar}

No empty declarations

Empty declaration blocks don't do anything, so why send them over the net?

Before:

.empty { ;}

After:

(nothing...)

Zero values

A zero is a zero. Zero pixels or % or centimeters or whatever, it's still zero. Also sometimes (when everything is a zero) you need just one zero instead of four, or three or two.

Before:

a {
  margin: 0px 0pt 0em 0%;
  background-position: 0 0ex;
  padding: 0in 0cm 0mm 0pc
}

After:

a{margin:0;background-position:0 0;padding:0}

Floats

For values such as 0.something, the 0 is not needed.

Before:

::selection {
  margin: 0.6px 0.333pt 1.2em 8.8cm;
}

After:

::selection{margin:.6px .333pt 1.2em 8.8cm}

Colors values

RGB color values are nice, but not the most concise form. Make them hex. Also AABBCC hex can be the shorter ABC. But don't touch RGBA and don't touch the IE filter values in quotes.

Before:

.color {
  me: rgb(123, 123, 123);
  impressed: #ffeedd;
  background: none repeat scroll 0 0 rgb(255, 0,0);
}

After:

.color{me:#7b7b7b;impressed:#fed;background:none repeat scroll 0 0 #f00}

Before:

.cantouch {
  alpha: rgba(1, 2, 3, 4);
  filter: chroma(color="#FFFFFF");
}

After (no color minification) :

.cantouch{alpha:rgba(1,2,3,4);filter:chroma(color="#FFFFFF")}

Single charsets

Only one charset is allowed per stylesheet. So, if there's more than one, strip it. It may happen when merging several stylesheets into one.

Before:

@charset "utf-8";
#foo {
  border-width: 1px;
}

/* second css, merged */
@charset "another one";
#bar {
  border-width: 10px;
}

After:

@charset "utf-8";#foo{border-width:1px}#bar{border-width:10px}

Alpha opacity

There's a shorter way to write opacity filter for IE.

So before:

code {
   -ms-filter: "PROGID:DXImageTransform.Microsoft.Alpha(Opacity=80)"; /* IE 8 */
   filter: progid:DXImageTransform.Microsoft.Alpha(Opacity=80);       /* IE 4-7 */
}

After:

code{-ms-filter:"alpha(opacity=80)";filter:alpha(opacity=80)}

There are more filters that could be shorten besides the opacity, but MSDN suggests the longer syntax should be used, So a bit more experimentation is needed here...

Thanks!

Whew, sort of a lengthy post. Thank you for reading and coming up next time... hacks :)

If you have ideas, comments, your bug reports are welcome

 

15 minutes could save you…

Monday, May 17th, 2010

Since I have a ton of things to do, I decided it was about time to spend some time with this blog, performance optimization-wise. Not do anything special, just the bare minimum, the no-brainer, works-every-time, easy stuff. And I'm quite happy with the results.

I only looked at the homepage, although the results will be seen throughout the site. Unfortunately there was a youtube video on the homepage, otherwise the results would have been even better, KB-wise

gzip on

First things first - turning compression on. I've previously whined about the host of this blog, site5.com and how I wasn't able to enable compression for some tests and had to make PHP do the compression. Turned out, all it takes is just ask and open a support ticket. Second level support decided to compile Apache with mod_deflate and I was good to go. I put this in .htaccess:

AddOutputFilterByType DEFLATE text/css text/plain text/xml application/javascript application/json

This blog has no JavaScript, just HTML and CSS. The HTML became 5K (from 23K) and CSS became 3 something (from 11K)

Flush

Decided to go fancy here and do first byte flush early on. That, of course, can be problematic, especially on shared hosting. I mean problematic is to have gzip working together with flushing (tips). Eventually I gave up wrestling with .htaccess and php.ini settings and let PHP handle it. That's why you may notice that text/html is missing from the DEFLATE list above.

So all it took was going to my WordPress theme, finding something called header.php and adding two lines of PHP.

One at the top:

<?php ob_start("ob_gzhandler"); ?>
<!DOCTYPE html ...

And one at the bottom:

<?php ob_flush(); flush(); ?>

Now the header part is flushed in one chunk and the rest in another. Check it in chunkview...

favicon.ico

I have no favicon so I get a lot of 404s, since browsers insist on downloading this little thing. So I created one. Took a social profile photo, croped (Option+K) and played with green colors in Mac's built-in Preview program. Then, ImageMagick:

$ convert -resize 16x16 dude.png PNG8:favicon16.png
$ convert favicon16.png favicon.ico

FTP. Done. No mo' 404s for favicon.

Minifying CSS

Copy-paste into CSSMin and CSS lost 30% of its weight. Now after gzipping and minifying, the CSS went from total 11.3K to 2.6K.

Cover images

I have some book covers on the homepage. One was simply linked to the publisher's site (extra DNS, connection...). Also turned out the publisher has redesigned the site so there was also a redirect. Disaster. Also the image was a 26K PNG where it could be 4k JPEG.

$ convert cover.png cover.png.jpg
$ jpegtran -copy none -optimize cover.png.jpg > cover.jpg

And just like that - 4 book covers were optimized.

Before/after

And that's all I did. I can probably do Expries headers too, convert/sprite all smiley GIFs and so on but that means touching more of WordPress, so I stopped here.

Now the page loads (onload) in 1.2 seconds, down from 2.2 (45% faster).

There are fewer DNS lookups, no 404s, no 301s

Page weight is down from 285K to 186K (34%). Actually if you exclude the youtube 142K SWF, the result is: 143K (before) to 44K (after) or a page weight saving of 70%. Not bad, not bad at all.

And the waterfalls. Before:

After:

PageSpeed score only went from 84/100 to 87/100 which was not impressive at all. I think 84K may have been too generous, but maybe not, given how worse sites there are out there.

So this is it - 15 minutes could save you 45% page load time and 70% download sizes :)

 

Browser sniffing with conditional comments

Thursday, May 13th, 2010

Browser sniffing is bad. But sometimes unavoidable. But doing it on the server is bad, because UA string is unreliable. The solution is to use conditional comments and let IE do the work. Because you're targeting IE most of the times anyway.

In fact IE8 is a decent browser for the most practical purposes and often you're just targeting IE before 8.

Conditional comments in practice use the following pattern:

  1. Load the decent browsers CSS
  2. Conditionally load IE6,7 overrides

The drawback is that IE6,7 get two HTTP requests. That's not good. Another drawback is that having a separate IE-overrides stylesheet is an excuse to get lazy and instead of solving a problem in a creative way, you (and the team) will just keep adding to it.

We can avoid the extra HTTP request by creating our CSS bundles on the server side and having two browser-specific but complete stylesheet files:

  1. The decent browsers CSS
  2. The complete CSS for IE6,7 not only the overrides

Then the question is loading one of the two conditionally without server-side UA sniffing. The trick (courtesy of duris.ru) is to use conditional comments to comment out the decent CSS so it's not loaded at all:

<!--[if lte IE 7]>
  <link href="IE67.css" rel="stylesheet" type="text/css" />
<![endif]-->
<!--[if gt IE 7]><!-->
  <link href="decent-browsers.css" rel="stylesheet" type="text/css" />
<!--<![endif]-->

The highlighting suggests what the decent browsers see.

IE6,7 see something like this after the conditional comments are processed:

  <link href="IE67.css" rel="stylesheet" type="text/css" />
<!--
  <link href="decent-browsers.css" rel="stylesheet" type="text/css" />
-->
 

Preload CSS/JavaScript without execution

Wednesday, April 21st, 2010

Preloading components in advance is good for performance. There are several ways to do it. But even the cleanest solution (open up an iframe and go crazy there) comes at a price - the price of the iframe and the price of parsing and executing the preloaded CSS and JavaScript. There's also a relatively high risk of potential JavaScript errors if the script you preload assumes it's loaded in a page different than the one that preloads.

After a bit of trial and lot of error I think I came up with something that could work cross-browser:

  • in IE use new Image().src to preload all component types
  • in all other browsers use a dynamic <object> tag

Code and demo

Here's the final solution, below are some details.

In this example I assume the page prefetches after onload some components that will be needed by the next page. The components are a CSS, a JS and a PNG (sprite).

window.onload = function () {

    var i = 0,
        max = 0,
        o = null,

        // list of stuff to preload
        preload = [
            'http://tools.w3clubs.com/pagr2/<?php echo $id; ?>.sleep.expires.png',
            'http://tools.w3clubs.com/pagr2/<?php echo $id; ?>.sleep.expires.js',
            'http://tools.w3clubs.com/pagr2/<?php echo $id; ?>.sleep.expires.css'
        ],
        isIE = navigator.appName.indexOf('Microsoft') === 0;

    for (i = 0, max = preload.length; i < max; i += 1) {

        if (isIE) {
            new Image().src = preload[i];
            continue;
        }
        o = document.createElement('object');
        o.data = preload[i];

        // IE stuff, otherwise 0x0 is OK
        //o.width = 1;
        //o.height = 1;
        //o.style.visibility = "hidden";
        //o.type = "text/plain"; // IE 
        o.width  = 0;
        o.height = 0;

        // only FF appends to the head
        // all others require body
        document.body.appendChild(o);
    }

};

A demo is here:
http://phpied.com/files/object-prefetch/page1.php?id=1
In the demo the components are delayed with 1 second each and sent with Expries header. Feel free to increment the ID for a new test with uncached components.

Tested in FF3.6, O10, Safari 4, Chrome 5, IE 6,7,8.

Comments

  • new Image().src doesn't do the job in FF because it has a separate cache for images. Didn't seem to work in Safari either where CSS and JS were requested on the second page where they sould've been cached
  • the dynamic object element has to be outside the head in most browsers in order to fire off the downloads
  • dynamic object works also in IE7,8 with a few tweaks (commented out in the code above) but not in IE6. In a separate tests I've also found the object element to be expensive in IE in general.

That's about it. Below are some unsuccessful attempts I tried which failed for various reasons in different browsers.

Other unsuccessful attempts

1.
I was actually inspired by this post by Ben Cherry where he loads CSS and JS in a print stylesheet. Clever hack, unfortunately didn't work in Chrome which caches the JS but doesn't execute it on the next page.

2.
One of the comments on Ben's post suggested (Philip and Dejan said the same) using invalid type attribute to prevent execution, e.g. text/cache.

var s = document.createElement('script');
s.src = preload[1];
s.type = "text/cache";
document.getElementsByTagName('head')[0].appendChild(s);

That worked for the most parts but not in FF3.6 where the JavaScript was never requested.

3.
A dynamic link prefetch didn't do anything, not even in FF which is probably the only browser that supports this.

for (i = 0, max = preload.length; i < max; i += 1) {
    var link = document.createElement('link');
    link.href = preload[i];
    link.rel = "prefetch";
    document.getElementsByTagName('head')[0].appendChild(link);
}

Then it took a bit of trial/error to make IE7,8 work with an object tag, before I stumbled into IE6 and gave up in favor of image src.

In conclusion

I believe this is a solution I could be comfortable with, although it involves user agent sniffing. It certainly looks less hacky than loading JS as CSS anyways. And object elements are meant to load any type of component so no semantic conflict here I don't believe. Feel free to test and report any edge cases or browser/OS combos. (JS errors in IE on the second page are ok, because I'm using console.log in the preloaded javascript)

Thanks for reading!

 

IE9 and JPEG-XR: first impressions

Monday, April 5th, 2010

One of the new features in IE9 is the support for the JPEG-XR format, which reportedly has a better compression. Is it something we should dive into ASAP?

JPEG-XR

The wikipedia article is here. This format is developed and patented (red flag!) by Microsoft (yellow flag! :) ), it replaces the suggested JPEG-2000 format and is an official standard as of mid '09. It's formerly known as "HD photo" and "Microsoft something something" and is heavily used in Vista and Windows 7 - two OSes I'm yet to experience. Anyway.

The wikipedia article says that the format has a better compression, so I had to take a look at it!

Software support

That's where the problems start. The list of software that supports the format is not too long and most of the software just reads it, like IE9. IE9 doesn't run on XP, so I couldn't actually test it.

What I managed to try was:

  • a plugin for Paint.NET that read/writes JPEG-XR
  • a 60-day trial version of MS' Expression Design 3 that reads/writes - this required an upgrade of my .Net platform, so the whole process took forever on the poor XP VMWare (but hey, I have a book to finish, so no distraction is too long!)
  • a plugin for IrfanView that reads the format

Nothing for Mac though.

There's rumors all over the place that MS have a beta version of a Photoshop plugin that works on the Mac, but even the MS' press release linked to a 404. (which redirected to a bing search - here's the secret to gaining search market share!)

Actually before I digged into these programs, my first instinct was to check if ImageMagick supports this format. Turns out no. IMagick used libjpeg like pretty much all image programs out there and curiously enough here's what libjpeg's README has to say:

FILE FORMAT WARS
================

The ISO JPEG standards committee actually promotes different formats like
JPEG-2000 or JPEG-XR which are incompatible with original DCT-based JPEG
and which are based on faulty technologies.  IJG therefore does not and
will not support such momentary mistakes (see REFERENCES).
We have little or no sympathy for the promotion of these formats.  Indeed,
one of the original reasons for developing this free software was to help
force convergence on common, interoperable format standards for JPEG files.
Don't use an incompatible file format!
(In any case, our decoder will remain capable of reading existing JPEG
image files indefinitely.)

Sounds like another red flag to me. IDG (Independent JPEG Group) are the creators of libjpeg. If libjpeg doesn't sounds like it will support it JPEG-XR, that means adoption can be really slow if not feasible at all. But even if IE is the only browser under the sun that supports the format and the format is so much better, then there might be cases where it could be beneficial to browser-sniff and send different image versions, as far-fetched as that may sound.

Test in Paint.NET

I started with Paint.NET because it was the easiest. I took a photo I've taken with the iPhone, keeping the use case real, and resized to a 600x450px which sounds like a normal thing to do in a blog post for example. I used IrfanView and PNG, so that the original is lossless (click the thumb for the actual source image).

I converted the photo with Paint.NET to JPEG and to JPEG-XR (also called WDP/HD Photo). In both cases I used quality of 80%. There was also an option for WDP which was 32bit image by default, which I changed to 24 bits because the image was smaller filesize.

WDP export in Paint.NET

The results were - 45K for XR/WDP and 24K for JPEG. So the good old JPEG was smaller - the exact opposite of what should've happened. Additionally JPEGTran shaved off another 1.3K from the file. Seemed like JPEG-XR is not that good after all. But as I said I had a book to write so I kept going with the distractions, determined to avoid writing for as long as I can.

Test with Expression Design

Expression Design produced the exact same WDP/HD Photo/JPEG-XR file - 45K. And this is not surprising actually, since there is an image framework from MS, called WIC, part of .Net, which is probably what Paint.NET and Expression Design both use. But surprisingly enough the JPEG outcome from Expression Design was significantly bigger - 57K. What?!

Then I looked at the visual quality and the number of colors and it turned out the JPEGs were pretty different, although they were converted from the same PNG and using 80% in both programs.

Software/Format JPEG JPEG-XR aka WDP aka HD Photo
Paint.NET 24K (50 000+ colors) 45K (104 000+ colors)
Expression Design 57K (54 000+ colors)

Visually the JPEG from Paint.NET is clearly lower quality than the one from Expression Design and from the WDP format. Interestingly, IrfanView produced an pretty much identical file when converting the PNG to JPEG with quality 80. So Expression Design seems to be doing something differently.

Using IrfanView I increased the quality of the JPEG until the file size reaches the file size of the WDP. (After all, all I want to know is which format has the smaller filesize). The quality of 93 resulted in a JPEG that was about the same file size as quality 80 JPEG-XR. Then I tried so look at the visual quality and although I'm not a designer, it seemed to me that the two images are pretty identical and XR is maybe just a little better. But that's a little subjective.

Here's the two files for comparison. Let me know which one you think is better. In this case they are both losslessly converted to PNG, so all browsers can see the WDP.

Here's also an image diff (from ImageMagick's compare) - it shows that technically the two images are very different (the white dots are pixels with the exact same color values)

One other thing about Expression Design - when exporting WDR, it has a "transparency" checkbox ON. This results is bigger images, so make sure you turn it off when using, it makes no sense for photos.
Expression Design options

Batch conversion?

My motivation in this experiment was to see if there's a way (and a reason) to do a batch conversion of all JPEG imagery to JPEG-XR. This would be my favorite performance optimization - you run one script and wake up to a 5-10% less image bandwidth.

Looks like JPEG-XR could probably look better for the same filesize, meaning maybe a smaller filesize for the same quality. But it's not easy to decide when it comes to quality and certainly even harder for a machine (a simple batch conversion script) to tell. I was hoping that there's a way to losslessly convert to JPEG-XR. From what I can see, there isn't. JPEG-XR does have a lossless option but it creates huge files (like 250K instead of 45), so the lossless versions are not meant to be on the web. BTW, the lossless option is the same as 100 quality (which is not the same in normal JPEG, where even 100% is lossy).

So, in conclusion - JPEG-XR may look promising but is currently unusable for practical purposes, because of

  • the extremely limited support in browsers (browser, actually),
  • very limited choices of creation software
  • the benefits are hard to distinguish
  • not possible to batch-and-forget process all old JPEGs

And there's the other turn off - patents. Although Microsoft has promised to promise not to sue people around for implementing JPEG-XR, a patent is a patent and all software patents must die on general principle :)

 

Publishing 5 books this year

Thursday, April 1st, 2010

So I'll be publishing 5 books this year. Isn't that incredible? Is it even possible? And good quality books at that? It's a nice challenge (my last year's challenge failed, I didn't even bother to count how bad it failed). I think it's possible, especially if you bend a little bit the meaning of "5", "year", "publishing" and "me" :)

Book #1 - High-Performance JavaScript

hpjs

Let's start bending - this is a book where I wrote just one chapter. It's a book by Nicholas Zakas with contributions from:

And I wrote my chapter mainly the last year. My chapter is about the DOM. But the book became available just now, few days ago, so it's published this year (bending, bending...)

Book #2 - JavaScript Patterns

I am hard at work on this one currently (explains the low activity on this blog). I started last year but only finished two chapters in '09. The bending part here is that I've already given presentations on the topic and have been writing a "patterns" column for JSMag for a while, so I can recycle quite a bit of content.

You can see the tentative cover, I hope it stays tentative and we can replace the hen with a nice cute little zebra (a.k.a. donkey with patterns). Between you and me, I think there's a new designer in O'Reilly with a bird fetish.

I expect the first draft for this one to complete within weeks. And no, it's not about implementing the Gang of Four patterns in JavaScript (has been done already by Ross, see above), although there's one chapter on a selected few - Singleton, Factory, Observer, Proxy, Decorator...

Book #3 - Speed Matters

I've contracted with Peachpit Press to write a book about performance targeted mainly at designers. It will be about the business (why go fast), technology (how) and psychology (perception of speed) of web performance. I'm excited about this one for a number of reasons:

  • there's a lot of misconceptions being spread around in designer blogs and books, especially sad when one of the books in question is a sort of a bible for web designers. I mean things like PNG vs. GIF, gzipping and others. I hope I can present a readable, concise and, above all, technically correct text for designers who may find Steve Souders' HPWS, a.k.a. "The Bible" a little too dry because it's from O'Reilly and has no colors
  • the publisher is considering a sort of novel approach to writing the book, fingers crossed, because I believe it's the right way to write technical books.
  • at the very least, the book will be available as early drafts while it's being written, which is new to me, but always wanted to do.
  • the book will be full color - again, new experience to me

The bending here comes from the fact that I'll try to reuse from the perf advent calendar if I can. So some content may be pre-written.

Book #4 - Object-Oriented JavaScript (2nd edition)

The bending here is obvious - it's just a second edition, not a completely new book from scratch. My goal here is:

  • address errata
  • address some excellent critiques (of this otherwise bestselling book!), such as this one by @kangax, which is the article that actually prompted me to pitch a second edition to the publisher. So many thanks to Yuri! Also thanks to Asen who's been sending me invaluable and detailed feedback on the first edition. And now thanks to Asen and Kangax (and also Dmitry) I'm spending some time lurking on comp.lang.javascript mailing list, which is full of great discussions.
  • ECMAScript5 update
  • some concepts such as hoisting, NFE, property attributes, etc
  • one completely new chapter on testing and docs
  • answers to the end-of-chapter exercises - an often-requested update

Hoping this title will not take a lot of time.

And since these 4 books should be finished by the end of August or thereabouts, this will give me whole 4 months (1/3 of an year) to dive into something I've been thinking about, two things actually - CSS and self-publishing.

Book #5 - CSS for web devs

CSS is widely misunderstood by many people, me including. I'm convinced we only use a portion of all that CSS is, and use it badly. I'm not saying it will be CSS: The Good Parts, but I plan to address what I consider bad habits in CSS (mis)use and write a book as a learning experience. This is the best way to learn IMO. It will be self-published and probably available online for free too. And by self-publish I don't mean lulu.com or some of the other resellers, but working with the printer and distributor directly.

Too ambitious? April Fool's?

Probably, but with all the pre-written stuff and other cheating, it may very well be doable. Then I guess I'll take a 5 year break :)

 

YUICompressor’s CSSMin

Wednesday, March 10th, 2010

Honored to be a part of the YUI project, I am now helping with the maintenance of the CSSMin part of the YUICompressor. My changes are now part of the trunk on github, so I'm official. Next on the agenda is documenting the thing, so that's what I'll try to do here, maybe in a few posts. You know, divide and conquer.

PHP, Java and a JavaScript port

Originally written in PHP by Isaac Schlueter and ported to Java by Julien Lecomte, CSSMin got a JavaScript port by yours truly some time ago. Because, after all, JavaScript is the language of the web, isn't it?

You can play with the latest git version of the JS port online here.

I'm also happy to report that the JS port is now used in PageSpeed and YSlow (as you probably know Firefox extensions are written in JavaScript)

Page Speed

YSlow

Building

If you want to play on your own with the source version of YUICompressor without waiting for the next release, you can build it like so:

  1. Checkout or download the code from http://github.com/yui/yuicompressor/
  2. Navigate to the root yuicompressor/ directory
  3. Type ant and hit enter

In order for this to work you need a somewhat recent Java SDK installed and also Ant running. (On the Mac, just do port install apache-ant to get Ant)

This is for the Java version, the JS version needs no building, of course.

Tests

There's a bunch of new tests now (and if you want to contribute to the project, you can always write more tests and test cases for any bugs), you can run them with the suite script that Isaac wrote:

  1. cd tests/
  2. ./suite.sh

One thing I added (and loved it) is to run the tests using the JS port as well. Since the JS min part is using Mozilla's Rhino (slightly modified), Rhino is part of the code. So I'm using this already available JavaScript interpreter to run the JS port. Convenient.

The procedure to write new tests is simple:

  1. Create source CSS file in the tests/ directory, e.g. new-test.css
  2. Create a new file with the expected result and name it with a .min extension, e.g. new-test.css.min

You can use the handy-dandy online version to help with the tests creation.

Next time

With those details out of the way, the next time I'll talk more about the different things that CSSMin does to your CSS code. Thanks for reading!

 

Uncompressed data in base64? Probably not

Thursday, February 4th, 2010

The beauty of experimentation is that failures are just as fun as successes. Warning: this post is about a failure, so you can skip it altogether :)

The perf advent calendar was my attempt to flush out a bunch of stuff, tools and experiments I was doing but never had the time to talk about. I guess 24 days were not enough. Here's another little experiment I made some time ago and forgot about. Let me share before it disappears to nothing with the next computer crash.

I've talked before about base64-encoded data URIs. I mentioned that according to my tests base64 encoding adds on average 33% to the file size, but gzipping brings it back, sometimes to less than the original.

Then I saw a comment somewhere (reddit? hackernews?) that the content before base64-encoding better be uncompressed, because it will be gzipped better after that. It made sense, so I had to test.

"Whoa, back it up... beep, beep, beep" (G. Constanza)

When using data URIs you essentially do this:

  1. take a PNG (which contains compressed data),
  2. base64 encode it
  3. shove it into a CSS
  4. serve the resulting CSS gzipped (compressed)

See how it goes: compress - encode - compress again. Compressing already compressed data doesn't sound like a good idea, so it sounds believable that skipping the first compression might give better results. Turns out it's not exactly the case.

Uncompressed PNG?

The PNG format contains information in "chunks". At the very least there's header (IHDR), data (IDAT) and end (IEND) chunks. There could be other chunks such as transparency, background and so on, but these three are required. The IDAT data chunk is compressed to save space, but it looks like it doesn't have to be.

PNGOut has an option to save uncompressed data, like
$ pngout -s4 -force file.png

This is what I tried - took several compressed PNGs, uncompressed them (with PNGOut's -s4), then encoded both with base64 encoding, put them in CSS, gzip the CSS and compared file sizes.

Code

<?php
// images to work with
$images = array(
  'html.png',
  'at.png',
  'app.png',
  'engaged.png',
  'button.png',
  'pivot.png'
);
//$images[] = 'sprt.png';
//$images[] = 'goog.png';
//$images[] = 'amzn.png';
//$images[] = 'wiki.png';

// css strings to write to files
$css1 = "";
$css2 = "";

foreach ($images as $i) {

  // create a "d" file, d as in decompressed
  copy($i, "d$i");
  $cmd = "pngout -s4 -force d$i";
  exec($cmd);

  // selector
  $sel = str_replace('.png', '', $i);

  // append new base64'd image 
  $file1 = base64_encode(file_get_contents($i));
  $css1 .= ".$sel {background-image: url('data:image/png;base64,$file1');}\n";
  $file2 = base64_encode(file_get_contents("d$i"));
  $css2 .= ".$sel {background-image: url('data:image/png;base64,$file2');}\n";

}

// write and gzip files
file_put_contents('css1.css', $css1);
file_put_contents('css2.css', $css2);
exec('gzip -9 css1.css');
exec('gzip -9 css2.css');

?>

Results

I tried to keep the test reasonable and used real life images - first the images that use base64 encoding in Yahoo! Search results. Then kept adding more files to grow the size of the result CSS - added Y!Search sprite, Google sprite, Amazon sprite and Wikipedia logo.

test with compressed PNG, bytes with uncompressed PNG, bytes difference, %
Y!Search images 700 1506 54%
previous + Y!Search sprite 5118 8110 36%
previous + Google sprite 27168 40836 33%
previous + Amazon sprite + Wikipedia logo 55804 79647 29%

Clearly starting with compressed images is better. Looks like the difference becomes smaller as the file sizes increase, it's possible that for very big files starting with uncompressed image could be better, but shoving more than 50K of images inline into a CSS file seems to be missing the idea of data URIs. I believe the idea is to use data URIs (instead of sprites) for small decoration images. If an image is over 50K it better be a separate request and cached, otherwise s small CSS tweak will invalidate the cached images.

 

One-click Minifier Gadget (OMG) - initial checkin

Sunday, January 31st, 2010

So I've been thinking and talking to folks about this idea of having one-stop shop for all your minification needs. Minification of JS and CSS as well as image optimization helps site performance by reducing download sizes. This is good. But not a lot of people do it.

People don't do it, because it's a PITA :) It's simple enough, but with deadlines upon you and all that, you don't want to do an extra step. That's why having a build process helps, by automating this. But setting up a build process is yet another PITA. So it goes.

So my idea was to help busy designers and developers, that wouldn't invest their time researching which minifiers are good, downloading setting up, learning about the 10+ PNG optimization tools... That's how the the idea for the one-click OMG tool came about. (One-drag is more appropriate, come to think of it...) One tool that runs on all operating systems - Win, Mac, Linux - and delivers all minification and optimization tools you need as one package.

Running

Running the tool is as simple as drag/dropping a bunch of files and directories. Here I've dropped "wordpress" directory. The tool recursively looks into the dropped files for things it can optimize. More information here.

OMG screenshot

Download

Version 0.0.1 is here. It doesn't do image optimization, only JS and CSS minification, but please feel free to download and give it a shot. Unzip the package for your OS and run omg.exe (Windows), OMG.app (Mac), or the omg binary (Linux)

Open source

The code is on GitHub. Fork and enjoy.

The developer's notes are there too - how to setup, run, package. Also a list of todos if you want to help.

Next?

This is just a preliminary version. Feel free to join, comment, suggest. Hate the name? Say so :)

Personally, looks like my plate is very full for the next moth or two, so I probably won't be actively working on the tool. I hope though the foundation is good enough and relatively documented, should be easy to pick up if anyone's interested in contributing.

Built with XUL

This has been a learning experience for me with XULRunner. I loved it. I love the idea of being able to create cross-OS desktop apps with JavaScript alone.

Behind the scenes, I'm using my JavaScript port of YUICompressor's CSSmin and Doug Crockford's JSMin. JSMin should be replaced with YUICompressor (or Google closure compiler) in the next release.

 

Performance job offers

Thursday, January 28th, 2010

I'm sure quite a few of you my fellow readers are crazy about web performance. And if you're seeking new challenges, timing can't be any better. Below are three excellent opportunities in three of the most high-traffic sites on the planet.

  • Yahoo
    Yahoo! Search is hiring a senior performance engineer. Yep, you'll be working with me and a bunch of incredible folks.
  • eBay
    eBay is hiring a performance engineer. I had the pleasure of delivering a tech talk there, it looks like a great place to be, fast-paced, and they do take performance seriously, lot of opportunities to sharpen your perf teeth (I don' have a URL, hit me up ssttoo at gmail if you're interested)
  • Facebook
    Facebook is hiring a performance engineer. Depending on who you trust, FB is #2 or #3 most popular site, so the challenge is definitely there. I've spoken to several awesome people, like David Wei performance engineer and researcher extraordinaire, and let me tell you, things are happening and you'll never be bored, even for a second.

And, not perf-related, but an extraordinary opportunity at YUI was announced today, it almost sounds too good to be true. One of the most important thing about a job is the people you'll be working with. Well, with YUI you can't wish for a higher concentration of front-end brain power. It's scary :)

 

The performance business pitch

Thursday, December 24th, 2009

Dec 24 This post is the last article in the 2009 performance advent calendar experiment.

The idea for this post actually came from the awesome Jeremy Hubert. I met him after a tech talk at Yahoo! I co-delivered and we talked about the presentation. The talk contained an opening part with the various business stats that were presented at Velocity this year (The Year of the Business Metrics) - not that I thought anyone at the tech talk needed to hear about business metrics. I believe developers are naturally attracted to performance, the same way as we are attracted to writing good quality maintainable code. We just like what we do and want to be good at it. The reason I added the business part is because I wanted to give the attendees something to "take home" and present to their bosses should their bosses need convincing. Jeremy suggested I should've instead created a downloadable presentation for people who are interested. So here is the said "business pitch" presentation plus added front-end pitch.

» Download the pitch (PPT)

I intentionally kept the slides clean without much formatting. My hope is that you'll find these slides useful to skin, update, mashup and build upon and deliver to your bosses and clients that need to be convinced to invest in performance and give you the time you need to work on performance improvements.

The slides are also on slideshare. In case you happen to have your client attentive and you don't have the pitch handy, you can always deliver it from slideshare's full screen view.

The rest of the post is the same as the contents of the powerpoint slides. The text under each slide is also added as a comment in the downloadable pitch.

It goes like this:

The Business of Performance

Does anybody like to wait in line at the cache register? Probably not. The same way probably no one likes one to sit and wait for slow web pages to finish loading.

The question is how does page loading time relate to the site's business metrics - metrics such as revenue, user satisfaction and repeat visits? And why should we invest in making pages load faster?

The way to test the impact of slow pages on business metrics is traditionally done with split testing. You randomly separate the users into two buckets. Half get normal page, half get the same page but artificially slowed down at the server side. Then you slow down a little more and see how the business metrics change.

Conducting such experiments people quickly realize that slow is bad and stop the experiment in order not to lose users. So experiments usually end at half a second delay.

Bing.com (while formerly live.com) was brave enough to introduce artificial slow downs as long as 2 seconds. The results were that every one of the monitored business metric changed exponentially for the worse.

Just focusing on the revenue metric, you can see that making the page 1 second slower causes 2.8% drop in revenue. With 2 seconds delay the revenue drops 4.3%.

Similar experiment was done at Google where the search results page was artificially delayed with 0.4 seconds. This experiment shows two additional points:

  1. The negative effect on business metrics gets worse with time. The users may tolerate the slower page during first few visits but will then search increasingly less and even abandon the site
  2. The second point is that even after removing the delay, things didn't get back to normal. There was an after-effect. Some people left the site for good, moved to the competition and didn't return.

Similar experiment was done at Yahoo – as soon as the page gets 0.4 seconds slower, there is a drop in the fill page traffic. This means users leave the page before it's loaded completely (before the onload event fires) – either hit Back button or click away from the page.

At AOL there was a measurement which tied the performance of individual pages with the number of visits those pages get.

The results were pretty clear – slow pages get fewer visits. Users learn to avoid them.

While the previous results showed how business metrics go down as the site gets slower, there is one study from Shopzilla that shows a more positive perspective – how business metrics improve as the site gets faster.

Shopzilla redesigned their site and went from page loading times around 6 seconds down to 1.2 seconds. As a result all metrics improved. The revenue increased with 7 to 12%, they got 25% more page views, the paid search traffic (from Search Engine Marketing) increased. And the required infrastructure to support the site was cut in half, which means they saved money while making more money.

Another piece of stats from Shopzilla illustrates how even the smallest change can have a positive impact. They moved static components to be served from a domain without cookies and this one small change resulted in 0.5% more revenue.

In Yahoo's list of performance best practices, this improvement is listed under number 24 (of 34) meaning that there's much more important and beneficial improvements to be made, but still – every little bit counts.

With regards to cutting down expenses and saving money, here's an example of what happened when Netflix enabled gzip compression for plain text components (gzipping is Yahoo's performance recommendation #4, meaning it's pretty important). Netflix noticed a sudden drop in outbound network traffic - they lowered their bandwidth bill by 43%.

When it comes to business metrics, different sites measure success differently. For some it's the number of new user signups, for some it's time spent on the site, for others – the amount of sales. But in general, what can we expect when improving performance and page loading times? More repeating visits, more time spend on the site viewing more pages, better conversions and revenue, happier users. All that while paying less for hardware and bandwidth.

Also more search engine traffic. Since April 2008 Google takes into account your page loading time when ranking the sponsored search links (AdWords). Shopzilla's stats are a testament to that. And Matt Cutts from Google has mentioned at a conference that there's lobbying inside Google to extend the load time as factor for organic search results too. So far this hasn't happened, but we can speculate that in the future a faster site may rank higher in Google's search results bringing more new visitors to the site.

So a faster site means more business. And where do we start?

To answer that let's look at one page and see where the time is spent for loading this page.
(This is where you can substitute this example with your page, or your client's page. Then run the page through WebPageTest.org and copy the waterfalls to replace in the next two slides)

Here is the so called "waterfall view" showing how does the page load. The first bar is the page itself. All the others are different page components required by the page such as images, scripts and stylesheets.

As you can see only 10% of the time is spent on assembling the page from different sources such as data from the database. We can call this portion server-side time.

Once the server has done constructing the page and has sent it to the browser, the rest is browser time. And that's where 90% of the time is spent – downloading all the page components. We can call this the front-end time.

Many people believe that the front-end time is not so important because all these images and other page components get downloaded once and then cached.

But even for repeat visits with full cache there's still a few components to be loaded, and the scripts and styles to be executed from the cache. In this page example, still only 38% of the total time is server time.

And Yahoo's research shows that 50-60% of all page visits (and 20% of all page views) will always be empty cache experience.

So the front-end matters and this is where the time is spent. For best ROI this is where you should be focusing – improving the front-end performance.

This is actually good news, because improving front-end performance is much easier than changing back-end systems, databases, infrastructure or reengineering server-side applications.

It's easier to improve the front-end and it results in greater overall benefits. Gutting 10% in half won't make a big difference, while cutting the 90% part in half will have a significant effect on the page loading time and therefore on the business metrics.

Credits/URLs

Thanks!

I hope you enjoyed the the calendar and this last article. It was quite the experience for me, writing a post a night, sometimes way too late into the night. Many thanks to the guest bloggers - Christian, Eric and Ara (2 posts!) - thanks to you guys I was able to catch some sleep. Hopefully next year we can make it a bigger community effort with more contributors and a whole separate web site.

 

CSS performance: UI with fewer images

Wednesday, December 23rd, 2009

Dec 23 This post is the one-before-last article in the 2009 performance advent calendar experiment.

Often performance improvements come with their drawbacks, sometimes improving performance causes pains in other parts of the development process or strips stuff from the final product. Sometimes there's even a conflict where you have to pick: slow, unusable and beautiful or fast and looking like hacked with a blunt axe. But it doesn't have to be this way.

This post outlines some approaches to achieving common UI elements using CSS tricks in as many browsers as possible while using as fewer images as possible. Some of the tricks are brand new, some are very, very old, IE5.5. old. They all have in common the "fewer or no images" mantra. Using fewer images comes with some pronounced benefits:

  • less time spent in Photoshop
  • lighter page, less HTTP requests, less image payload
  • fewer elements in the sprite to maintain (and sometimes fewer sprites) which means longer lived sprites with fewer updates and cache invalidation
  • generally easier maintenance - it's much easier to change a color value than to update and push a new image version

Sometimes some browsers may not be fully supported but that's ok - as long as there's progressive enhancement and the basic page is usable, people rarely notice 1px glows and other ornaments.

So let's get started. BTW, a test page with the stuff discussed in the post is here.

Rounded corners

Yep, let's tackle the biggie.

Forget rounded corners in browser that don't support border-radius. Period. It may be hard to argue this case, but definitely try. Doing rounded corners any other way than border-radius is a pain - it adds markup bloat, it makes you create more images or sprite elements. It's tougher to update. Just forget it. Forget rounded corners in IE < 9 (as rumor has it border-radius is coming to IE9). People may argue that IE is important for your audience. No doubt that's true, but rounded corners are not so important for the audience. Show your designer Yahoo Search results page - the sidebar on the left-hand side. Not very rounded in IE. Do you think this was an easy battle - losing rounded corners in IE for such a high-profile site? Ask the man who won the battle ;)

So starting with a normal module - head, body and border:

The markup - nice and clean:

  <div class="module">
    <div class="hd"><h3>This is the header</h3></div>
    <div class="bd">
      <p>Here comes the content</p>
      <p>Here comes some more</p>
      <p>You can never have too much content, because
         content is king, right?
      </p>

    </div>
  </div>

Some fairly simple border radius to support Firefox, Webkit (Safari, Chrome, iPhone...) and, since a few days ago, Opera 10.5 alpha:

.module {
  -moz-border-radius: 9px;
  -webkit-border-radius: 9px;
  border-radius: 9px;
}

Result:

This is it! Easy-peasy, lemon squeezy.

Now, it's a little annoying to write three declarations for the same thing, but, hey - beats images and extra markup hands down. Also annoying are the differences when setting individual corners (-moz-border-radius-topleft is -webkit-border-top-left-radius). In this case we need to also round the header (class .hd) so it doesn't bleed through the beautifully rounded corners:

.hd {
  -moz-border-radius: 8px 8px 0 0;
  -webkit-border-top-left-radius: 8px;
  -webkit-border-top-right-radius: 8px;
  border-radius: 8px 8px 0 0;
}

Verdict:

  • Full support: Firefox, Safari, Chrome, Opera 10.5
  • Fallbacks: IE (corners are not rounded)

Drop shadows and glows

Another favorite effect designers love - dropping shadows. It's easy to enhance that existing .module without any new images:

.module {
  /* offset left, top, thickness, color with alpha */
  -webkit-box-shadow: 5px 5px 5px rgba(0, 0, 0, 0.5);
  -moz-box-shadow: 5px 5px 5px rgba(0, 0, 0, 0.5);
  box-shadow: 5px 5px 5px rgba(0, 0, 0, 0.5);
  /* IE */
  filter:progid:DXImageTransform.Microsoft.dropshadow(OffX=5, OffY=5, Color='gray');
  /* slightly different syntax for IE8 */
  -ms-filter:"progid:DXImageTransform.Microsoft.dropshadow(OffX=5, OffY=5, Color='gray')";
}

And now our module casts a shadow:

Now two notes for IE: first the shadow doesn't have alpha so it's not as nice and second, this filter may not play along with other filters in the same module. But the shadow is cast and that's a check for IE too, even IE5.5!

You may notice that in this case we basically need to more or less repeat the same declaration three times and the IE declaration two times. This is irksome, but hopefully keeping the strings close together should help gzip compression.

As for glowing, it's the same thing in FF, Webkit, Opera, only without any offset. For IE, there's a different filter called glow:

.glow {
  -webkit-box-shadow: 0 0 10px rgba(50, 50, 50, 0.8);
  -moz-box-shadow: 0 0 10px rgba(50, 50, 50, 0.8);
  box-shadow: 0 0 10px rgba(50, 50, 50, 0.5);
  filter:progid:DXImageTransform.Microsoft.glow(Strength=5, Color='gray');
  -ms-filter:"progid:DXImageTransform.Microsoft.glow(Strength=3, Color='gray')";
}

I added these declaration to a new class .glow so I can add the class name to modules that need to glow. The result:

The result as it glows in IE:

Now you see why I added only 3 pixels glow in IE and whole 5 in the rest. The IE glow is a little .. interesting. Also in IE8 (could be my VM, in IE6 XP no VM all looks OK) the glow seems to move slightly when you hover over the module.

Verdict for shadows and glows:

  • Full support: FF, Safari, Chrome, Opera, IE5.5 and up

More info:

Gradients

Ah, gradients. Sometimes so subtle that we, muggles and other mere mortals, don't see them even when we try our hardest. But for the designer they could be life/death situation.

Let's make the head (class .hd) of our module a gradient without any images:

.hd {background-image: -moz-linear-gradient(top, #641d1c, #f00);
  background-image: -webkit-gradient(linear, left top, left bottom, from(#641d1c), to(#f00));
  filter:progid:DXImageTransform.Microsoft.gradient(startColorstr=#ff641d1c,endColorstr=#ffff0000);
  -ms-filter: "progid:DXImageTransform.Microsoft.gradient(startColorstr=#ff641d1c,endColorstr=#ffff0000)";
}

The result:

What a beautiful (code-speaking, of course, not so sure about visually beautiful) module. It has rounded corners, drop shadows and a gradient and so far we haven't used even a single image. Which means this reddish module can become blue, green or, god forbid, pink - with a single tweak in the code, the CMS or the user preferences (if you're building a social network for example).

Gradients verdict:

  • Full support: FF, Safari, Chrome, IE
  • Fallbacks: Opera (solid color)

More info:

... and RGBA for all

Being able to set the transparency of the background without affecting the transparency of the foreground (the text) is quite handy. That's why there's rgba() in CSS (red, green, blue, alpha). IE is not yet supporting it, but we can use the gradient filter which does support transparency. In this case we don't need the actual gradient so we set start and end color to the same thing. Also the background: transparent is needed for the whole thing to work in IE:

.rgba {
  background-color: transparent;
  background-color: rgba(200,200,200,0.8);
  filter:progid:DXImageTransform.Microsoft.gradient(startColorstr=#99dddddd,endColorstr=#99dddddd);
  -ms-filter: "progid:DXImageTransform.Microsoft.gradient(startColorstr=#99dddddd,endColorstr=#99dddddd)";
}

The result is pleasantly cross-browser:

RGBA verdict

  • Full support: Firefox, Safari, Opera, Chrome, IE

Rotating images

It happens that sometimes you use the same image only flipped. For example open/close thingies, menus and such. How about reusing the same image and rotating it with CSS?

.arrow {background: url(arrow.png) no-repeat; display: block; float: left; width: 33px; height: 33px;}
.right{ /* this is the original image*/ }
.left {
  -moz-transform: rotate(180deg);-webkit-transform: rotate(180deg); -o-transform: rotate(180deg);
  filter: progid:DXImageTransform.Microsoft.BasicImage(rotation=2);
  -ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=2)";}
.up {
  -moz-transform: rotate(270deg);-webkit-transform: rotate(270deg); -o-transform: rotate(270deg);
  filter: progid:DXImageTransform.Microsoft.BasicImage(rotation=3);
  -ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=3)";}
.down {
  -moz-transform: rotate(90deg);-webkit-transform: rotate(90deg); -o-transform: rotate(90deg);
  filter: progid:DXImageTransform.Microsoft.BasicImage(rotation=1);
  -ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=1)";
}

Here's the result. Single image:

Arrow

Result:

the HTML:

<span class="arrow right"></span>
<span class="arrow left"></span>
<span class="arrow up"></span>
<span class="arrow down"></span>

You may notice that the CSS could be quite verbose for saving such small images. It's highly recommended you add the rotation code to a class and use the class name when necessary instead of repeating the same long declaration for every use case or image. Then pray to the gods of compression that this thing gzips well ;)

Verdict

  • Full support: Firefox, IE, Safari, Opera, Chrome

Multiple UI elements with the same background image

The last few tricks have something in common - they each use one background image. The background images are very small - usually around 100 bytes. The tiny image has some transparency to it and is placed as a background-image which sits on top of a background-color. Because of the transparency, the background color shines through, but differently depending on the transparency level of the image above it.

The result is - different UI elements with different colors (and even different hover colors) which can be part of CMS or part of user's skinning and they all reuse the same tiny background. So what can we do this way? A lot of interesting background effects, but here's a few.

Glossy buttons

Here's the end result:

All these buttons share the same background image. The image is 1x1000 and repeated horizontally. The 1000 is just to be safe, very safe, because 50, 100 or 1000 doesn't affect the file size which just a mere 100 bytes. The upper half of the image is a little less transparent. The lower half is 100% transparent. When placed on top of the solid color the whole thing looks shiny and glossy. And you can change the color any way you like.

The HTML:

<p class="button">button1<p>
<p class="button button2">button2<p>
<p class="button button3">button3<p>
<p class="button button4">button4<p>
<p class="button button5">button5<p>
<p class="button button6">button6<p>

And the CSS can't be simpler:

.button {
  background-image:url(http://tools.w3clubs.com/mask/mask.php?x=1000&type=h);
  background-position: center;
}
.button:hover {background-color: #F29222;}
.button2 {background-color: #A41D1C;}
.button3 {background-color: #0F6406;}
.button4 {background-color: #333f79;}
.button5 {background-color: black;}
.button6 {background-color: orange; color: black;}

Actually in the test page I have inlined the image with data URI to save the whole HTTP request for such a teeny image.

As you can see in the URL of the background, I've done a little script to generate some background images:
http://tools.w3clubs.com/mask/mask.php?x=1000&type=h
The image generator's source code is right here.

Stripes

Same technique - but used to generate striped background:

It's basically the same code, only using a different call to the image generator to give us a different background image.

HTML:

  <div class="module stripe earth glow">
    <div class="bd">
      <p>striped background</p>
    </div>

  </div>

  <div class="module stripe tech glow">
    <div class="hd phony stripe"><h3>stripity-stripes</h3></div>
    <div class="bd">
      <p>striped background with the same background image</p>
    </div>
  </div>

CSS:

.stripe {background-image: url(http://tools.w3clubs.com/mask/mask.php?type=stripe);}
.earth  {background-color: olive;}
.tech   {background-color: #bbb;}
.phony  {background-color: #0F6406;}

Again, this image can be a data URI so we save the single HTTP request.

And another gradient

So if you don't like the previously discussed way to do gradients, here's another one. The same trick with the solid color background and a semi-transparent image on top.

Result:

The background images as generated by the service are:
// lighter at the top
http://tools.w3clubs.com/mask/mask.php?type=gradient
// darker at the top
http://tools.w3clubs.com/mask/mask.php?type=gradient&flip=1

Again, you can see the test page here and the source for the image generation is here.

For yet another example of this technique check my post on this (abandoned) blog phonydev.com. There I take an image and a mask image generated by the same script and overlay to achieve an iPhone-like glossy button.

iphone glossiness

Thanks!

Kind of long post, but I hope you're excited about removing a bunch of images from your future designs. If I've omitted some details, please let me know in the comments.

 

iPhone caching

Tuesday, December 22nd, 2009

Dec 22 This post is part of the 2009 performance advent calendar experiment. Stay tuned for the articles to come - only 2 to go!

Some time ago there was a post on YUIBlog highlighting the findings of Wayne Shea and Tenni Theurer on the caching behavior of the iPhone. Curious if things have changed several iPhone OS updates later, I ran some experiments with OS3 and OS3.1.

Highlights

  • iPhone will not cache components bigger than 15K (was 25K in the previous experiment)
  • total cache is about 1.5MB (was 500K)
  • short-term memory cache will store components up to 1941K provided the component size does not divide by 4
  • powering off the device still clears all the cache, Expires headers still important
  • not sure since when, but closing all tabs clears the cache too
  • consider HTML5 application cache to improve cacheability and provide offline experience – components in the application cache can be bigger than 15K and stay cached even after you clear Safari's cache from the Settings app (or clear it any other way)

15 is the new 25

Previous experiment showed that the iPhone will not cache components bigger that 25K, but this number is now down to 15K. And this is ungzipped size, meaning if you have for example a 20K JavaScript and your server sends it gzipped down to 10K, it will not be cached by the iPhone.

So minification can definitely help you here. Minification may not be as beneficial as gzipping in desktop browsers, but it may be the difference between a cache hit or miss in the iPhone. Minify, then gzip. And, in general, try to keep your component sizes low - that always helps in any browser.

Total cache size

First off, how was the experiment run? I simply tailed the access log of a server, in order to monitor which components get requested:

$ tail -f ~/logs/http/access.log | grep "XXX.YYY.ZZZ.000"

Where XXX.YYY.ZZZ.000 is the IP address. You can also replace that with just "iPhone" if you think no one else is hitting this site with an iPhone.

Then I requested components with different sizes and look at what the log tail says.

Once it was clear that 15K is the maximum components size, the next question was how many of those 15K components can be cached before iPhone runs out of space allocated for caching? And the result was 105. Attempting to cache one more file resulted in removing existing ones from the cache. Then playing with the size of the very last component helped nail the exact number of bytes available in the cache – 105 * 15 + 7 = 1582K.

So the total cache is just a little over 1.5 MB (earlier research showed 0.5MB for earlier iPhone OS). Have in mind that this is the total cache shared between all pages, it's not per domain or per tab.

Memory cache

The iPhone has a little bit more cache space in the form of memory cache. The memory cache has no per-component size limit, so your components can be as big as the memory cache – 1941 bytes (don't ask how many attempts it took to nail this number).

There a little catch though, probably due to a bug, and it's that if the component size divides by 4, it won't be cached in memory. That's true for JavaScripts – they will be requested on every page load. CSS files with sizes over 15 that divide by 4 will not be requested in the same tab session, but if you close the tab and open the page in another tab, they will be requested again.

So components under 15K go to the disk cache, a 16K component is never cached (so is 20, 24, …, 36,.. 100,… 1024 and so on), 17K and up (up to almost 2 megs) components are cached in memory.

This memory cache is very unreliable as you can guess, because it gets cleared very often, it's probably useful only in the same user session on your web site.

Some observations

  • Closing all tabs, except for blank ones (as when you do "New Page") and then closing Safari is the same as clearing the cache from Settings. So it's a good idea in your normal daily use to leave at least one page open, before you close Safari in order not to cause the implicit cache clearing.
  • Tapping the reload icon in the address bar sends unconditional requests for all components, without the If-Modified-Since header and ignoring the Expires header. So to speed up your browsing, refresh the page by tapping the address bar and then tapping GO, don't use the refresh icon.
  • The tests I ran were using OS 3.0 and OS 3.1.2. I got my phone last Christmas which means that according to this page it came with OS 2.2. I don't remember if I ran any tests with OS2.2. too. I do remember though that I tried some tests back when I didn't have a phone and asked help from Ryan Grove and Nicole Sullivan. Chatting over IM they were loading pages while I was tail-ing the server log. Those tests must have been with OS 2 or OS2.1. Back then the results showed that the limit for a component that gets stored in the cache was 10Meg, which was also the total size of the cache. Now I have my doubts that back then we only tested memory cache, not disk cache. In any event, it's important to note that those restrictions are all software restrictions. No matter what model the phone, it's the OS that sets the limits. Different models with the same OS will behave the same when it comes to caching sizes.

HTML5 offline application cache

All in all we can safely summarize that the iPhone has no cache to speak of. 15K per component is nothing and the limit of 1.5Megs shared with all other pages will have your cached components kicked out pretty quickly.

So – what's an iPhone performance optimizer to do? Use HTML5 goodies.

One thing to consider is some form of client storage supported by the mobile webkit (either key-value or SQLite), but that will require some of your javascript to handle the caching, expiration and so on.

Another thing to do is use the offline application cache. It's meant to support applications to work offline, but it also ends up being useful to improve caching of online applications as well.

What you need is a simple text file called a manifest. In there you list all the components required by your application. E.g.

CACHE MANIFEST

/root/path/to/images/image.jpg
or/maybe/relative/paths/too/scripts.js
http://example.org/good-old/absolute/path/oojs-home.jpg

It's important to serve this file with content type text/cache-manifest.

Then in your html tag just point to that file. Let's say you called the manifest mycache.manifest (could be dynamic, php, or anything, as long as it's served with the proper content type). So your HTML should start like the following:

<!DOCTYPE html>
<html manifest="mycache.manifest">

And this is it. There's also JavaScript API available if you want to play with the items stored in the cache.

Now the browser will request the manifest file every time (Expires header won't help you here) and if its contents is changed it will quietly and unobtrusively download the updated components in the background and the next time the user will see the updated page. Needless to say it makes sense that all the pages of the site share the same manifest.

The best part is that when using offline cache none of the restrictions mentioned above apply – you can store files bigger than 15K and you don't share your total cache space with anyone.

I used this for a personal project – whomsy.com if you want to play with some requests and monitor the traffic (e.g. using the iPhone simulator and Charles proxy).

Manifest – not just for iPhone?

The manifest idea sounds so nice, it makes sense to apply it to desktop browsers as well. In a way it's similar to the idea of having web archives – a zip with all page components (sorry, can't find the URL for the proposal for web archives right now). You can include the manifest on your homepage and, after onload, the browser preloads all the other components used by internal pages (no need to do preloading yourself in JavaScript, no need to worry about spriting images and so on). This is absolutely doable and it also provides the side benefit that your app is available offline, which is the original purpose of the offline cache :) However the application of this technique is a little limited.

  • Browser support – currently Webkit browsers support manifests, and IE doesn't. Firefox does support offline cache, however it shows an ugly confirmation message at the top of the page and you normally don't want to stress out your users with warning messages (see screenshot below)
  • The "homepage" meaning the page that includes the cache manifest also gets cached implicitly. This could be undesired behavior for many sites, but could work just fine for Ajaxy one-page-sites and applications.

Overall, I'm pretty enthusiastic about using offline cache for the purposes of normal page caching too. The future is here, it's just not evenly distributed (nor supported in IE) yet :)

 

Progressive rendering via multiple flushes

Monday, December 21st, 2009

Dec 21 This post is part of the 2009 performance advent calendar experiment. Stay tuned for the articles to come.

Perceived page loading time is just as important as the real loading time. And when it comes to user perception, visible indication of progress is always good. The user gets feedback that something is going on (and in the right direction) and feels much better.

Using multiple content flushes allows you to improve both the real and the perceived performance. Let's see how.

The head flush()

While your server is busy stitching the HTML from different sources - web services, database, etc - the browser (and hence the user) just sits and waits. Why don't we make it work and start downloading components we know will be absolutely needed, such as the logo, the sprite, css, javascript... While the server is busy, you can send a part of the HTML, for example the whole <head> of the document. In there you can put the references to external components such as the CSS, which then the browser can head-start downloading while waiting for the whole HTML response.

<html>
<head>
  <title>the page</title>
  <link rel=".." href="my.css" />
  <script src="my.js"></script>
</head>
<?php flush(); ?>
<body>
  ...

Doing something like this will result in shorter waterfalls, because more downloads can happen in parallel. In the waterfall below the page is not yet completed at 0.4 seconds, yet the browser has already requested more components.

One step further - multiple flushes

While having the browser busy is good and the whole page loads faster, can we do better? How about letting the user see something while the server is still busy? Remember - show something "in a blink of an eye". And how about doing the flushing several times, hence rendering the page is stages. This will help show usable partial versions of the page without waiting on potentially long-loading page or waiting some blocking JavaScripts to load.

Here's an example - Google search results.

The header part of that page (chunk #1) doesn't need any complicated logic. True, the page title and pre-filling the input box are dynamic parts, but this is just a simple echo of the user input, nothing that requires complex work. So out goes the header. Notice that the number of search results is not visible yet. In this chunk there's the logo, so the sprite is downloaded. If this page was using external CSS, it would be included in the head too.

Then, the search results, the meat of the page. Out it goes, as a static HTML chunk #2.

So far the page is done, but not quite yet. There's some progressive enhancement of the page which requires JavaScript. And JavaScript blocks. So include it in the footer as chunk #3.

The page is usable even without the JavaScript and without the footer. The user mostly cares about the results, so chunk #2 is what matters most. Chunk #3 can even get lost in transfer. As for chunk #1 - it's just to give feedback that "hey, we're working on your query". The first chunk actually tricks the user to believe that the query is already done. "Heck", concludes the user seeing that the page is already coming up, "that was FAST" :)

Boring details - HTTP 1.1 chunked encoding

So how does this work actually, how come the HTML is served in parts?

The answer is - HTTP 1.1 chunked encoding. A normal HTTP response looks like:

Headers...
[One empty line]
<html><body>response...

A chunked response would be like:

Headers...
Transfer-Encoding: chunked
More headers...

size of chunk #1
<html><body>...chunk #1...

size of chunk #2
...chunk #2...

size of chunk #3
...chunk #3 </body><html>

0 (meaning "the end!")

The chunk sizes are given as a hexadecimal. Here's an example response (from the wikipedia article)

HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked

25
This is the data in the first chunk

1C
and this is the second one

0

Chunking/flushing strategies

There's basically two paths you can take when it comes to chunking:

Application-level chunking
when your web app knows when to flush, based on some logic. The Google search above is an example of application-level flushing - header, body, footer are parts of the page, known to the web application
Server-level chunking
when your application doesn't worry about how the content is delivered, but leaves this task to the server. The server can choose some strategy for flushing, for example once every 4K of output. Google search does this when the user agent doesn't support gzip - it flushes out every 4k. Bing.com does it similarly - flushing about every 1k (sometimes 2K, sometimes less) regardless of the user agent's support for gzip. Interestingly enough bing's first chunk is often non-readable characters - just something that tells the browser - hey, I'm alive, here's your first byte

Amazon is an interesting example of doing a mix of both strategies - it looks like sometimes it's server level (e.g. in the middle of an html tag) but sometimes it looks like the chunk contains (or wraps up) a page section. Amazon is also a good example of focusing on what's important on a page (Why is the user here? What do they want? What do we want them to do?) and making sure it's rendered first.

The areas I've marked in this screenshot do not correspond to chunks exactly, but they show how the page renders progressively, using a combination of chunked response and source order:

  1. #1 - header. Every page has one. Get done with it.
  2. #2 - buy now. This is what we want the user to do.
  3. #3 - image. The user wants to see what they're buying. Probably also helps Amazon minimize merchandise returns :)
  4. #4 - title/price. Kind of important too.

The rest of the page - reviews, comments, also buy... all this can wait, it's all secondary. Most of it is way below the fold anyway.

Tools: none

Unfortunately, as far as I know, there's no tool that offers visibility into those chunks - what is the contents of each one and how does it looks like.

Fiddler let's you see the encoded chunked response, but that doesn't help too much. At least it gives you an idea that the response was chunked - you can see under Inspectors/Transformer that there is a "Chunked Transfer-Encoding" checkbox.

Also in Fiddler under the next tab - Headers, you can see the chunked encoding header. And also Fiddler helpfully tells you that the response has been encoded (the yellow message on top)

HTTPWatch let's you see the incomprehensible response, but it also tells you the number of chunks. Note that the number includes the last 0 in the response, so when it says 4 chunks it means actually 3.

I also tried to fill the void in the tools department by attempting a Firefox extension. Unfortunately it didn't work, I couldn't find API exposed to extensions that would give me access to the raw encoded response. Looks like it would be possible as an extension to HTTPwatch or Fiddler though - both offer extensibility, both show the raw response.

For own consumption I did a PHP script to request the page and give me the chunks ungzipped. It's very primitive but you can give it a shot here. Test with Yahoo! Search for example.

Works with gzip!

A common concern is whether chunked encoding works together with gzipping the response. Yes, it does. In this presentation Steve Souders sheds some light (PPT, see slide #66) on how to address common issues and also gives flush() equivalents in languages other than PHP.

There's a number of things that can be in the way of successfully implementing chunked encoding+gzip including:

  • call ob_flush() in addition to flush() and careful if you have several output buffers started, you may need to iterate and flush all of them
  • some browsers may require some minimal amount of content before starting to parse, IE needs at least 256 bytes
  • you may need a newer version of Apache
  • DeflateBufferSize in Apache may be set too high for your chunk size
  • check the the user-contributed comments in the php.net manual for flush() for helpful advice and ideas
  • there's ob_implicit_flush() setting which may flush for you instead of you doing flush() every time

Do it!

It may be tricky to implement multiple (or even single) flushing, but it's well worth it. There's some server setup hurdles when it comes to gzip, but once you figure it out, you only do it once. As a reward you get faster loading times, plus progressive rendering so your page not only has faster time-to-onload by feels that way too.

 

Extreme JavaScript optimization

Sunday, December 20th, 2009

Dec 20 This article is part of the 2009 performance advent calendar experiment. Today's article is a second contribution from Ara Pehlivanian (here's the first).

There's a Belorussian translation provided by Patricia. Thanks!

Ara PehlivanianAra Pehlivanian has been working on the Web since 1997. He's been a freelancer, a webmaster, and most recently, a Front End Engineer at Yahoo! Ara's experience comes from having worked on every aspect of web development throughout his career, but he's now following his passion for web standards-based front-end development. When he isn't speaking and writing about best practices or coding professionally, he's either tweeting as @ara_p or maintaining his personal site at http://arapehlivanian.com/.

There's an odd phenomenon underway in the JavaScript world today. Though the language has remained relatively unchanged for the past decade, there's an evolution afoot among its programmers. They're using the same language that brought us scrolling status bar text to write some pretty heavy duty client-side applications. Though this may seem like we're entering a Lada in an F1 race, in reality we've spent the last ten years driving an F1 race car back and forth in the driveway. We were never using the language at its full potential. It took the discovery of Ajax to launch us out of the driveway and onto the race track. But now that we're on the track, there's a lot of redlining and grinding of gears going on. Not very many people it seems, know how to drive an F1 race car. At least not at 250 mph.

The thing of it is, it's pretty easy to put your foot to the floor and get up to 60 mph. But very soon you'll have to shift gears if you want to avoid grinding to a halt. It's the same thing with writing large client-side applications in JavaScript. Fast processors give us the impression that we can do anything and get away with it. And for small programs it's true. But writing lots of bad JavaScript can very quickly get into situations where your code begins to crawl. So just like an average driver needs training to drive a race car, we need to master the ins and outs of this language if we're to keep it running smoothly in large scale applications.

Variables

Let's take a look at one of the staples of programming, the variable.Some languages require you to declare your variables before using them, JavaScript doesn't. But just because it isn't required doesn't mean you shouldn't do it. That's because in JavaScript if a variable isn't explicitly declared using the 'var' keyword, it's considered to be a global, and globals are slow. Why? Because the interpreter needs to figure out if and where the variable in question was originally declared, so it goes searching for it. Take the following example.

function doSomething(val) {
    count += val;
};

Does count have a value assigned to it outside the scope of doSomething? Or is it just not being declared correctly? Also, in a large program, having such generic global variable names makes it difficult to keep collisions from happening.

Loops

Searching the scope chain for where count is declared in the example above isn't such a big deal if it happens once. But in large-scale web applications, not very much just happens once. Especially when loops are concerned. The first thing to remember about loops, and this isn't just for JavaScript, is to do as much work outside the loop as possible. The less you do in the loop, the faster your loop will be. That being said, let's take a look at the most common practice in JavaScript loops that can be avoided. Take a look at the following example and see if you can spot it:

for (var i = 0; i < arr.length; i++) {
    // some code here
}

Did you see it? The length of the array arr is recalculated every time the loop iterates. A simple fix for this is to cache the length of the array like so:

for (var i = 0, len = arr.length; i < len; i++) {
    // some code here
}

This way, the length of the array is calculated just once and the loop refers to the cached value every time it iterates.

So what else can we do to improve our loop's performance? Well, what other work is being done on every iteration? Well, we're evaluating whether the value of i is less than the value of len and we're also increasing i by one. Can we reduce the number of operations here? We can if the order in which our loop is executed doesn't matter.

for (var i = 100; i--; ) {
    // some code here
}

This loop will execute 50% faster than the one above because on every iteration it simply subtracts a value from i, and since that value is not "falsy," in other words it isn't 0, then the loop goes on. The moment the value hits 0, the loop stops.

You can do this with other kinds of loops as well:

while (i--) {
    // some code here
}

Again, because the evaluation and the operation of subtracting 1 from i is being done at the same time, all the while loop needs is for i to be falsy, or 0, and the loop will exit.

Caching

I touched briefly on caching above when we cached the array length in a variable. The same principle can be applied in many different places in JavaScript code. Essentially, what we want to avoid doing is sending the interpreter out to do unnecessary work once it's already done it once. So for example, when it comes to crawling the scope chain to find a global variable for us, caching it the reference locally will save the interpreter from fetching it every time. Here, let me illustrate:

var aGlobalVar = 1;

function doSomething(val) {
    var i = 1000, agv = aGlobalVar;
    while (i--) {
        agv += val;
    };
    aGlobalVar = agv;
};

doSomething(10);

In this example, aGlobalVar is only fetched twice, not over a thousand times. We fetch it once to get its value, then we go to it again to set its new value. If we had used it inside the while loop, the interpreter would have gone out to fetch that variable a thousand times. In fact, the loop above takes about 3ms to run whereas if avg += val; were replaced with aGlobalVar += val; then the loop would take about 10ms to run.

Property Depth

Nesting objects in order to use dot notation is a great way to namespace and organize your code. Unforutnately, when it comes to performance, this can be a bit of a problem. Every time a value is accessed in this sort of scenario, the interpreter has to traverse the objects you've nested in order to get to that value. The deeper the value, the more traversal, the longer the wait. So even though namespacing is a great organizational tool, keeping things as shallow as possible is your best bet at faster performance. The latest incarnation of the YUI Library evolved to eliminate a whole layer of nesting from its namespacing. So for example, YAHOO.util.Anim is now Y.Anim.

Summary

These are just a few examples of how to improve your code's performance by paying attention to how the JavaScript interpreter does its work. Keep in mind though that browsers are continually evolving, even if the language isn't. So for example, today's browsers are introducing JIT compilers to speed up performance. But that doesn't mean we should be any less vigilant in our practices. Because in the end, when your web app is a huge success and the world is watching, every millisecond counts.

 

The new game show: “Will it reflow?”

Saturday, December 19th, 2009

Dec 19 This post is part of the 2009 performance advent calendar experiment. Stay tuned for the articles to come.

Intrigued by Luke Smith's comment and also Alois Reitbauer's comment on the previous post about rendering I did some more testing with dynaTrace and SpeedTracer. Also prompted by this tweet, I wanted to provide an example of avoiding reflows by using document fragments as well as hiding elements with display: none. (btw, sorry that I'm slow to respond to tweets and blog comments, just too much writing lately with the crazy schedule, but I do appreciate every tweet and comment!)

So welcome to the new game show: "Will it reflow?" where we'll look into a few cases where it's not so clear if the browser will do a reflow or just a repaint. The test page is here.

Changing classnames

The first test is fairly straightforward - we only want to check what happens when you change the class name of an element. So using "on" and "off" class names and changing them on mouse over.

.on {background: yellow; border: 1px solid red;}
.off {background: white; border: 1px dashed green;}

Those CSS rules shouldn't trigger a reflow, because no geometry is being changed. Although the test is pushing it a bit by changing borders, which may affect geometry, but not in this case.

The test code:

// test #1 - class name change - will it reflow?
var onoff = document.getElementById('on-off');
onoff.onmouseover = function(){
  onoff.className = 'on' ;
};
onoff.onmouseout = function(){
  onoff.className = 'off';
};

Sooo.. will it reflow?

In Chrome - no! In IE - yes.

In IE, even changing the class name declarations to only change color, which is sure not to cause reflow, still caused a reflow. Looks like in IE, any type of className change causes a reflow.

cssText updates

The recommended way to update multiple styles in one shot (less DOM access, less reflows) is to update the element's style.cssText property. But.. will it reflow when the style changes do not affect geometry?

So let's have an element with a style attribute:

...style="border: 1px solid red; background: white"...

The JavaScript to update the cssText:

// test #2 - cssText change - will it reflow?
var csstext = document.getElementById('css-text');
csstext.onmouseover = function(){
  csstext.style.cssText += '; background: yellow; border: 1px dashed green;';
};
csstext.onmouseout = function(){
  csstext.style.cssText += '; background: white; border: 1px solid red;';
};

Will it reflow?

In Chrome - no! In IE - yes.

Even having cssText (and the initial style) only play with color, there's still a reflow. Even trying to just write the cssText property (as opposed to read/write with += ) still causes a reflow. The fact that cssText property is being updated causes IE to reflow. So there might be cases where setting individual properties separately (like style.color, style.backgroundColor and so on) which don't affect geometry, might be preferable to touching the cssText.

Next contestant in the game show is...

addRule

Will the browser reflow when you update stylesheet collections programatically? Here's the test case using addRule and removeRule (which in Firefox are insertRule/deleteRule):

// test #3 - addRule - will it reflow?
var ss = document.styleSheets[0];
var ruletext = document.getElementById('ruletext');
ruletext.onmouseover = function(){
  ss.addRule('.bbody', 'color: red');
};
ruletext.onmouseout = function(){
  ss.removeRule(ss.rules.length - 1);
};

Will it? Will it?

In Chrome - yes. The fact that style rules in the already loaded stylesheet have been touched, causes Chrome to reflow and repaint. Even though class .bbody is never used. Same when creating a new rule with selector body {...} - reflow, repaint.

In IE there's a repaint definitely, and there's also a kind of reflow. Looks like dynaTrace shows two kinds of rendering calculation indicators: "Calculating generic layout" and "Calculating flow layout". Not sure what is the difference (web searches disappointingly find nada/zero/rien for the first string and my previous blog post for the second). Hopefully "generic" would be less expensive than "flow".

display: none

In my previous post I boldly claimed that elements with display: none will not have anything to do with the render tree. IE begs to differ (thanks to dynaTrace folks for pointing that out).

A good way to minimize reflows is to update the DOM tree "offline" out of the live document. One way to do so is to hide the element while updates are taking place and then show it again.

Here's a test case where rendering and geometry are affected by simply adding more text content to an element by creating new text nodes.

// test #4 - display: none - will it reflow
var computed, tmp;
var dnonehref = document.getElementById('display-none');
var dnone = document.getElementById('bye');
if (document.body.currentStyle) {
  computed = dnone.currentStyle;
} else {
  computed = document.defaultView.getComputedStyle(dnone, '');
}

dnonehref.onmouseover = function() {
  dnone.style.display = 'none';
  tmp = computed.backgroundColor;
  dnone.appendChild(document.createTextNode('No mo tests. '));
  tmp = computed.backgroundColor;
  dnone.appendChild(document.createTextNode('No mo tests. '));
  tmp = computed.backgroundColor;
  dnone.appendChild(document.createTextNode('No mo tests. '));
  tmp = computed.backgroundColor;
}
dnonehref.onmouseout = function() {
  dnone.style.display = 'inline';
}

Will it reflow?

In Chrome - no. Although it does do "restyle" (calculating non-geometric styles) every time a text node is added. Not sure why this restyling is needed.

In IE - yes. Unfortunatelly display: none seems to have no effect on rendering in IE, it still does reflows. I tried with removing the show/hide code and having the element hidden from the very beginning (with an inline style attribute). Same thing - reflow.

document fragment

Another way to preform updates off-DOM is to create a document fragment and once ready, shove the fragment into the DOM. The beauty is that the children of the fragment get copied, not the fragment itself, which makes this method pretty convenient.

Here's the test/example. And will it reflow?

// test #5 - fragment - will it reflow
var fraghref = document.getElementById('fragment');
var fragment = document.createDocumentFragment();
fraghref.onmouseover = function() {
  fragment.appendChild(document.createTextNode('No mo tests. '));
  tmp = computed.backgroundColor;
  fragment.appendChild(document.createTextNode('No mo tests. '));
  tmp = computed.backgroundColor;
  fragment.appendChild(document.createTextNode('No mo tests. '));
  tmp = computed.backgroundColor;
}
fraghref.onmouseout = function() {
  dnone.appendChild(fragment);
}

In Chrome - no! And no rendering activities take place until the fragment is added to the live DOM. Then, just like with display: none a restyle is being performed for every new text node inserted. And even though the behavior is the same for fragments as for updating hidden elements, fragments are still preferable, because you don't need to hide the element (which will cause another reflow) initially.

In IE - no reflow! Only when you add the final result to the live DOM.

Thanks!

Thank you for reading. Tomorrow if all goes well there should be a final post related to JavaScript and then moving on to ... other topics ;)

 

DOM access optimization

Friday, December 18th, 2009

Dec 18 This post is part of the 2009 performance advent calendar experiment. Stay tuned for the articles to come.

This blog series has sailed from the shores of networking, passed down waterfalls and reflows, and arrived in ECMAScriptland. Now, turns out there's one bridge to cross to get to DOMlandia.

(OK, I need to get some sleep, evidently. Anyway.) Ara Pehlivanian talked about strategies for loading JavaScript code. Yesterday's post was about rendering and how you can prevent making things worse in JavaScript. Today's post will be about DOM access optimizations and, if all is good, tomorrow's post will round up the JavaScript discussion with some techniques for extreme optimization.

What's with the DOM

Document Object Model (DOM) is a language-independent API for accessing and working with a document. Could be an HTML document, or XML, SVG and so on. DOM is not ECMAScript. ECMAScript is just one way to work with the DOM API. They both started in the web browser but now things are different. ECMAscript has many other uses and so has the DOM. You can generate a page server side, using the DOM is you like. Or script Photoshop with ECMAScript.

All that goes to show that ECMAScript and DOM are now separate, they make sense on their own, they don't need each other. And they are kept separate by the browsers.

For example WebCore is the layout, rendering and DOM library used by WebKit, while JavaScriptCore (most recently rewritten as SquirrelFish) is the implementation of ECMAScript. In IE - Trident (DOM) and JScript. In Firefox - Gecko (DOM) and SpiderMonkey (ECMAScript).

The toll bridge

An excellent analogy I heard in this video from John Hrvatin of MSIE is that we can think of the DOM as a piece of land and JavaScript/ECMAScript as another piece of land. Both connected via a toll bridge. I tried to illustrate this analogy here.

domland and ecmaland connected with a bridge

All your JavaScript code that doesn't require a page - code such as loops, ifs, variables and a handful of built-in functions and objects - lives in ECMALand. Anything that starts with document.* lives in DOMLand. When your JavaScript needs to access the DOM, you need to cross that bridge to DOMlandia. And the bad part is that it's a toll bridge and you have to pay a fee every time you cross. So, the more you cross that bridge, the more you pay your performance toll.

How bad?

So, how serious is that performance penalty? Pretty serious actually. DOM access and manipulations is probably the most expensive activity you do in your JavaScript, followed by layouting (reflowing and painting activities). When you look for problems in your JavaScript (you use a profile instead of shooting in the dark, of course, but still) most likely it's the DOM that's slowing you down.

As an illustration, consider this bad, bad code:

// bad
for (var count = 0; count < 15000; count++) {
    document.getElementById('here').innerHTML += 'a';
}

This code is bad because it touches the DOM twice on every loop tick. It doesn't cache the reference to the DOM element, it looks for that element every time. Then this code also updates the live DOM which means it causes a reflow and a repaint (which are probably buffered by the browsers and executed in batches, but still bad).

Compare with the following code:

// better
var content = '';
for (var count = 0; count < 15000; count++) {
    content += 'a';
}
document.getElementById('here').innerHTML += content;

Here's we only touch the DOM twice at the end. The whole time otherwise we work in ECMAland with a local variable.

And how bad is the bad example? It's over 100 times worse in IE6,7 and Safari, over 200 times worse in FF3.5 and IE8 and about 50 times worse in Chrome. We're not talking percentages here - we talk 100 times worse.

Now obviously this is a bad and made up example, but it does show the magnitude of the problem with DOM access.

Mitigating the problem - don't touch the DOM

How to speed up DOM access? Simply do less of it. If you have a lot of work to do with the DOM, cache references to DOM elements so you don't have to query the DOM tree every time to find them. Cache the values of the DOM properties if you'll do a chunk of of work with them. And by cache I mean simply assign them to local variables. Use selectors API where available instead of crawling the DOM yourself (upgrade your JavaScript library if it's not taking advantage of the selectors API). Be careful with HTML Collections.

// bad
document.getElementById('my').style.top = "10px";
document.getElementById('my').style.left = "10px";
document.getElementById('my').style.color = "#dad";

// better
var mysty = document.getElementById('my').style;
mysty.top = "10px";
mysty.left = "20px";
mysty.color = "#dad";

// better
var csstext = "; top: 10px; left: 10px; color: #dad;";
document.getElementById('my').style.cssText += csstext

Basically, every time you find you're accessing some property or object repeatedly, assign it to a local variable and work with that local variable.

HTMLCollections

HTMLCollections are objects returned by calls to document.getElementsByTagName(), document.getElementsByClassName() and others, also by accessing the old-style collections document.links, document.images and the like. These HTMLCollection objects are array-like, list-like objects that contain pointers to DOM elements.

The special thing about them is that they are live queries against the underlying document. And they get re-run a lot, for example when you loop though the collection and access its length. The fact that you touch the length requires re-querying of the document so that the most up-to-date information is returned to you.

Here's an example:

// slow
var coll = document.getElementsByTagName('div');
for (var count = 0; count < coll.length; count++) {
    /* do stuff */
}

// faster
var coll = document.getElementsByTagName('div'),
    len = coll.length;
for (var count = 0; count < len; count++) {
    /* do stuff */
}

The slower version requeries the document, the faster doesn't because we use the local value for the length. How slower is the slower? Depends on the document and how many divs in it, but in my tests anywhere between 2 times slower (Safari) to 200 times slower (IE7)

Another thing you can do (especially if you'll loop the collection a few times) is to copy the collection into an array beforehand. Accessing the array elements will be significantly faster than accessing the DOM elements in the collection, again 2 to 200 times faster.

Here's an example function that turns the collection into an array:

function toArray(coll) {
    for (var i = 0, a = [], len = coll.length; i < len; i++) {
        a[i] = coll[i];
    }
    return a;
}

If you do that you also need to account for the one-off cost of copying that collection to an array.

Using event delegation

Event delegation is when you attach event listener to a parent element and it handles all the events for the children because of the so-called event bubbling It's a graceful way to relieve the browser from a lot of extra work. The benefits:

  • You need to write less event-attaching code.
  • You will usually use fewer functions to handle the events because you're attaching one function to handle parent events, not individual function for each child element. This means less functions to store in memory and keep track of.
  • Less events the browser needs to monitor
  • Easier to detach event handlers when an element is removed and therefore easier to prevenk IE memory leaks. Sometimes you don't even need to detach the event handler if children change, but the event-handling parent stays the same.

Thanks for reading!

  • Don't touch the DOM when you can avoid it, cache DOM access to local references
  • Cache length of HTMLCollections to a local variable while looping (good practice for any collections or arrays looping anyway). Copy the collection to an array if you'll be looping several times.
  • Use event delegation

Links

 

Rendering: repaint, reflow/relayout, restyle

Thursday, December 17th, 2009

Dec 17 This post is part of the 2009 performance advent calendar experiment. Stay tuned for the articles to come.

Nice 5 "R" words in the title, eh? Let's talk about rendering - a phase that comes in the Life of Page 2.0 after, and sometimes during, the waterfall of downloading components.

So how does the browser go about displaying your page on the screen, given a chunk of HTML, CSS and possibly JavaScript.

The rendering process

Different browsers work differently, but the following diagram gives a general idea of what happens, more or less consistently across browsers, once they've downloaded the code for your page.

Rendering process in the browser

  • The browser parses out the HTML source code (tag soup) and constructs a DOM tree - a data representation where every HTML tag has a corresponding node in the tree and the text chunks between tags get a text node representation too. The root node in the DOM tree is the documentElement (the <html> tag)
  • The browser parses the CSS code, makes sense of it given the bunch of hacks that could be there and the number of -moz, -webkit and other extensions it doesn't understand and will bravely ignore. The styling information cascades: the basic rules are in the User Agent stylesheets (the browser defaults), then there could be user stylesheets, author (as in author of the page) stylesheets - external, imported, inline, and finally styles that are coded into the style attributes of the HTML tags
  • Then comes the interesting part - constructing a render tree. The render tree is sort of like the DOM tree, but doesn't match it exactly. The render tree knows about styles, so if you're hiding a div with display: none, it won't be represented in the render tree. Same for the other invisible elements, like head and everything in it. On the other hand, there might be DOM elements that are represented with more than one node in the render tree - like text nodes for example where every line in a <p> needs a render node. A node in the render tree is called a frame, or a box (as in a CSS box, according to the box model). Each of these nodes has the CSS box properties - width, height, border, margin, etc
  • Once the render tree is constructed, the browser can paint (draw) the render tree nodes on the screen

The forest and the trees

Let's take an example.

HTML source:

<html>
<head>
  <title>Beautiful page</title>
</head>
<body>

  <p>
    Once upon a time there was
    a looong paragraph...
  </p>

  <div style="display: none">
    Secret message
  </div>

  <div><img src="..." /></div>
  ...

</body>
</html>

The DOM tree that represents this HTML document basically has one node for each tag and one text node for each piece of text between nodes (for simplicity let's ignore the fact that whitespace is text nodes too) :

documentElement (html)
    head
        title
    body
        p
            [text node]

        div
            [text node]

        div
            img

        ...

The render tree would be the visual part of the DOM tree. It is missing some stuff - the head and the hidden div, but it has additional nodes (aka frames, aka boxes) for the lines of text.

root (RenderView)
    body
        p
            line 1
	    line 2
	    line 3
	    ...

	div
	    img

	...

The root node of the render tree is the frame (the box) that contains all other elements. You can think of it as being the inner part of the browser window, as this is the restricted area where the page could spread. Technically WebKit calls the root node RenderView and it corresponds to the CSS initial containing block, which is basically the viewport rectangle from the top of the page (0, 0) to (window.innerWidth, window.innerHeight)

Figuring out what and how exactly to display on the screen involves a recursive walk down (a flow) through the render tree.

Repaints and reflows

There's always at least one initial page layout together with a paint (unless, of course you prefer your pages blank :)). After that, changing the input information which was used to construct the render tree may result in one or both of these:

  1. parts of the render tree (or the whole tree) will need to be revalidated and the node dimensions recalculated. This is called a reflow, or layout, or layouting. (or "relayout" which I made up so I have more "R"s in the title, sorry, my bad). Note that there's at least one reflow - the initial layout of the page
  2. parts of the screen will need to be updated, either because of changes in geometric properties of a node or because of stylistic change, such as changing the background color. This screen update is called a repaint, or a redraw.

Repaints and reflows can be expensive, they can hurt the user experience, and make the UI appear sluggish.

What triggers a reflow or a repaint

Anything that changes input information used to construct the rendering tree can cause a repaint or a reflow, for example:

  • Adding, removing, updating DOM nodes
  • Hiding a DOM node with display: none (reflow and repaint) or visibility: hidden (repaint only, because no geometry changes)
  • Moving, animating a DOM node on the page
  • Adding a stylesheet, tweaking style properties
  • User action such as resizing the window, changing the font size, or (oh, OMG, no!) scrolling

Let's see a few examples:

var bstyle = document.body.style; // cache

bstyle.padding = "20px"; // reflow, repaint
bstyle.border = "10px solid red"; // another reflow and a repaint

bstyle.color = "blue"; // repaint only, no dimensions changed
bstyle.backgroundColor = "#fad"; // repaint

bstyle.fontSize = "2em"; // reflow, repaint

// new DOM element - reflow, repaint
document.body.appendChild(document.createTextNode('dude!'));

Some reflows may be more expensive than others. Think of the render tree - if you fiddle with a node way down the tree that is a direct descendant of the body, then you're probably not invalidating a lot of other nodes. But what about when you animate and expand a div at the top of the page which then pushes down the rest of the page - that sounds expensive.

Browsers are smart

Since the reflows and repaints associated with render tree changes are expensive, the browsers aim at reducing the negative effects. One strategy is to simply not do the work. Or not right now, at least. The browser will setup a queue of the changes your scripts require and perform them in batches. This way several changes that each require a reflow will be combined and only one reflow will be computed. Browsers can add to the queued changes and then flush the queue once a certain amount of time passes or a certain number of changes is reached.

But sometimes the script may prevent the browser from optimizing the reflows, and cause it to flush the queue and perform all batched changes. This happens when you request style information, such as

  1. offsetTop, offsetLeft, offsetWidth, offsetHeight
  2. scrollTop/Left/Width/Height
  3. clientTop/Left/Width/Height
  4. getComputedStyle(), or currentStyle in IE

All of these above are essentially requesting style information about a node, and any time you do it, the browser has to give you the most up-to-date value. In order to do so, it needs to apply all scheduled changes, flush the queue, bite the bullet and do the reflow.

For example, it's a bad idea to set and get styles in a quick succession (in a loop), like:

// no-no!
el.style.left = el.offsetLeft + 10 + "px";

Minimizing repaints and reflows

The strategy to reduce the negative effects of reflows/repaints on the user experience is to simply have fewer reflows and repaints and fewer requests for style information, so the browser can optimize reflows. How to go about that?

  • Don't change individual styles, one by one. Best for sanity and maintainability is to change the class names not the styles. But that assumes static styles. If the styles are dynamic, edit the cssText property as opposed to touching the element and its style property for every little change.
    // bad
    var left = 10,
        top = 10;
    el.style.left = left + "px";
    el.style.top  = top  + "px";
    
    // better 
    el.className += " theclassname";
    
    // or when top and left are calculated dynamically...
    
    // better
    el.style.cssText += "; left: " + left + "px; top: " + top + "px;";
  • Batch DOM changes and perform them "offline". Offline means not in the live DOM tree. You can:
    • use a documentFragment to hold temp changes,
    • clone the node you're about to update, work on the copy, then swap the original with the updated clone
    • hide the element with display: none (1 reflow, repaint), add 100 changes, restore the display (another reflow, repaint). This way you trade 2 reflows for potentially a hundred
  • Don't ask for computed styles excessively. If you need to work with a computed value, take it once, cache to a local var and work with the local copy. Revisiting the no-no example from above:
    // no-no!
    for(big; loop; here) {
        el.style.left = el.offsetLeft + 10 + "px";
        el.style.top  = el.offsetTop  + 10 + "px";
    }
    
    // better
    var left = el.offsetLeft,
        top  = el.offsetTop
        esty = el.style;
    for(big; loop; here) {
        left += 10;
        top  += 10;
        esty.left = left + "px";
        esty.top  = top  + "px";
    }
  • In general, think about the render tree and how much of it will need revalidation after your change. For example using absolute positioning makes that element a child of the body in the render tree, so it won't affect too many other nodes when you animate it for example. Some of the other nodes may be in the area that needs repainting when you place your element on top of them, but they will not require reflow.

Tools

Only about a year ago, there was nothing that can provide any visibility into what's going on in the browser in terms of painting and rendering (not that I am aware of, it's of course absolutely possible that MS had a wicked dev tool no one knew about, buried somewhere in MSDN :P). Now things are different and this is very, very cool.

First, MozAfterPaint event landed in Firefox nightlies, so things like this extension by Kyle Scholz showed up. mozAfterPaint is cool, but only tells you about repaints.

DynaTrace Ajax and most recently Google's SpeedTracer (notice two "trace"s :)) are just excellent tools for digging into reflows and repaints - the first is for IE, the second for WebKit.

Some time last year Douglas Crockford mentioned that we're probably doing some really stupid things in CSS we don't know about. And I can definitely relate to that. I was involved in a project for a bit where increasing the browser font size (in IE6) was causing the CPU go up to 100% and stay like this for 10-15 minutes before finally repainting the page.

Well, the tools are now here, we don't have excuses any more for doing silly things in CSS.

Except, maybe, speaking of tools..., wouldn't it be cool if the Firebug-like tools showed the render tree in addition to the DOM tree?

A final example

Let's just take a quick look at the tools and demonstrate the difference between restyle (render tree change that doesn't affect the geometry) and reflow (which affects the layout), together with a repaint.

Let's compare two ways of doing the same thing. First, we change some styles (not touching layout) and after every change, we check for a style property, totally unrelated to the one just changed.

bodystyle.color = 'red';
tmp = computed.backgroundColor;
bodystyle.color = 'white';
tmp = computed.backgroundImage;
bodystyle.color = 'green';
tmp = computed.backgroundAttachment;

Then the same thing, but we're touching style properties for information only after all the changes:

bodystyle.color = 'yellow';
bodystyle.color = 'pink';
bodystyle.color = 'blue';

tmp = computed.backgroundColor;
tmp = computed.backgroundImage;
tmp = computed.backgroundAttachment;

In both cases, these are the definitions of the variables used:

var bodystyle = document.body.style;
var computed;
if (document.body.currentStyle) {
  computed = document.body.currentStyle;
} else {
  computed = document.defaultView.getComputedStyle(document.body, '');
}

Now, the two example style changes will be executed on click on the document. The test page is actually here - restyle.html (click "dude"). Let's call this restyle test.

The second test is just like the first, but this time we'll also change layout information:

// touch styles every time
bodystyle.color = 'red';
bodystyle.padding = '1px';
tmp = computed.backgroundColor;
bodystyle.color = 'white';
bodystyle.padding = '2px';
tmp = computed.backgroundImage;
bodystyle.color = 'green';
bodystyle.padding = '3px';
tmp = computed.backgroundAttachment;

// touch at the end
bodystyle.color = 'yellow';
bodystyle.padding = '4px';
bodystyle.color = 'pink';
bodystyle.padding = '5px';
bodystyle.color = 'blue';
bodystyle.padding = '6px';
tmp = computed.backgroundColor;
tmp = computed.backgroundImage;
tmp = computed.backgroundAttachment;

This test changes the layout so let's called it "relayout test", the source is here.

Here's what type of visualization you get in DynaTrace for the restyle test.

DynaTrace

Basically the page loaded, then I clicked once to execute the first scenario (requests for style info every time, at about 2sec), then clicked again to execute the second scenario (requests for styles delayed till the end, at about 4sec)

The tool shows how the page loaded and the IE logo shows onload. Then the mouse cursor is over the rendering activity following the click. Zooming into the interesting area (how cool is that!) there's a more detailed view:

dynatrace

You can clearly see the blue bar of JavaScript activity and the following green bar of rendering activity. Now, this is a simple example, but still notice the length of the bars - how much more time is spent rendering than executing JavaScript. Often in Ajax/Rich apps, JavaScript is not the bottleneck, it's the DOM access and manipulation and the rendering part.

OK, now running the "relayout test", the one that changes the geometry of the body. This time check out this "PurePaths" view. It's a timeline plus more information about each item in the timeline. I've highlighted the first click, which is a JavaScript activity producing a scheduled layout task.

dynatrace

Again, zooming into the interesting part, you can see how now in addition to the "drawing" bar, there's a new one before that - the "calculating flow layout", because in this test we had a reflow in addition to the repaint.

dynatrace

Now let's test the same page in Chrome and look at the SpeedTracer results.

This is the first "restyle" test zoomed into the interesting part (heck, I think I can definitely get cused to all that zooming :)) and this is an overview of what happened.

speedtracer

Overall there's a click and there's a paint. But in the first click, there's also 50% time spent recalculating styles. Why is that? Well, this is because we asked for style information with every change.

Expanding the events and showing hidden lines (the gray lines were hidden by Speedtracer because they are not slow) we can see exactly what happened - after the first click, styles were calculated three times. After the second - only once.

speedtracer

Now let's run the "relayout test". The overall list of events looks the same:

speedtracer

But the detailed view shows how the first click caused three reflows (because it asked for computed style info) and the second click caused only one reflow. This is just excellent visibility into what's going on.

speedtracer

A few minor differences in the tools - SpeedTracer didn't show when the layout task was scheduled and added to the queue, DynaTrace did. And then DynaTrace didn't show the details of the difference between "restyle" and "reflow/layout", like SpeedTracer did. Maybe simply IE doesn't make a difference between the two? DynaTrace also didn't show three reflows instead of one in the different change-end-touch vs. change-then-touch tests, maybe that's how IE works?

Running these simple examples hundreds of times also confirms that for IE it doesn't matter if you request style information as you change it.

Here's some more data points after running the tests with enough repetitions:

  • In Chrome not touching computed styles while modifying styles is 2.5 times faster when you change styles (restyle test) and 4.42 times faster when you change styles and layout (relayout test)
  • In Firefox - 1.87 times faster to refrain from asking computed styles in restyle test and 1.64 times faster in the relayout test
  • In IE6 and IE8, it doesn't matter

Across all browsers though changing styles only takes half the time it takes to change styles and layout. (Now that I wrote it, I should've compared changing styles only vs. changing layout only). Except in IE6 where changing layout is 4 times more expensive then changing only styles.

Parting words

Thanks very much for working through this long post. Have fun with the tracers and watch out for those reflows! In summary, let me go over the different terminology once again.

  • render tree - the visual part of the DOM tree
  • nodes in the render tree are called frames or boxes
  • recalculating parts of the render tree is called reflow (in Mozilla), and called layout in every other browser, it seems
  • Updating the screen with the results of the recalculated render tree is called repaint, or redraw (in IE/DynaTrace)
  • SpeedTracer introduces the notion of "style recalculation" (styles without geometry changes) vs. "layout"

And some more reading if you find this topic fascinating. Note that these reads, especially the first three, are more in depth, closer to the browser, as opposed to closer to the developer which I tried to do here.

 

How To Measure Web Site Performance

Wednesday, December 16th, 2009

Dec 16 This article is part of the 2009 performance advent calendar. Today's article is a contribution from Eric Goldsmith. Please welcome Eric and stay tuned for the articles to come.

Eric GoldsmithEric Goldsmith (@GoldsmithEric), Operations Architect at AOL, has more than 20 years of experience providing technical leadership in the areas of product development, engineering and operations. At AOL he has led efforts to deliver the highest levels of performance and availability for top Web sites, including: AOL.com; AIM.com; and AOL Video; among others.

His areas of expertise include Performance Analysis, Capacity Planning, Network Engineering, and Software Development. Prior to AOL, Eric worked for companies such as UUNet, WorldCom and CompuServe, as well as telecom and Internet startups. He holds a BS in Computer Science from The Ohio State University.

When trying to quantify the performance of a Web site, we most commonly mean the response time. The two most common methods of gathering response time data are from Field Metrics and Synthetic Measurement.

Field Metrics measure response time from real user traffic, and generally rely on JS instrumentation of the pages, or toolbars to collect data. Synthetic Measurement involves loading pages in one of a myriad of tools designed to collect metrics. Each method has its strengths and weaknesses - but that's a discussion for another time.

Synthetic Measurement is an easy way to get started quantifying your site performance. But there are some important guidelines for getting accurate results.

Test Speed

A common mistake people make when testing the response time of a Web site is testing on their office network. According to Speedtest.net, my office network gives me 53 Mbps. A typical DSL user gets about 1.5 Mbps, or 35 times slower.

How much difference does this make in practice?

CNN Response Times

I tested cnn.com with Webpagtest at two different speeds. At DSL speeds (1.5 Mbps) it loads in ~15 seconds, while at FIOS speeds (20 Mbps) it loads in ~ 3 seconds - a 5x difference.

This demonstrates how a developer testing from his workstation in the office could see a 3s load time, and conclude that all is well. While a home user on DSL could see a 15s load time and abandon the site due to slowness.

As mentioned above, you can test at different speeds with Webpagetest. And if you want to simulate different speeds from your desktop, try AKMA Labs' Network Delay Simulator.

Test Location

Does a site perform differently from LA vs. NY? US vs. UK?

This double waterfall excerpt (the waterfalls from two test executions overlayed together) demonstrates the difference in response time for a page measured from East and West coasts.

East vs. West

A difference of ~ 2 seconds of load time, based solely on where the measurement was taken.

Now, this is a particularly egregious example, and it's possible to design your site so geographic differences are minimized. But you have to know there's a problem before you can solve it.

So, where should you test your site? From where your users are. Most web analytics tools will provide this information. For example, here's what Google Analytics shows for Webpagetest:

Usage by Geography

The ability to test from different locations is available from Webpagetest (if you're interested in hosting a location, let me know), and from Keynote via their KITE tool.

Test Iterations

Can you accurately determine the response time of your site with a single measurement?

The figure below is a typical response time distribution for a Web site.

Response Time Distribution

If you only took a single measurement, where would it fall on the distribution? For example, it could be the left-most red circle - or the right-most (or anywhere else). The delta between those two points is more than 4 seconds. So, what's the response time of your site?

There are two things to consider when trying to answer that: how many measurements (samples) do you need, and how do you coalesce them into a representative number?

Determining the number of measurements needed can get complicated. But in general, 30 or more is suggested.

How to aggregate all that data into a response time number is subject all it's own, and a whole 'nother discussion. It's common practice to just average them. That's not an ideal approach, but it's a starting point, and gets the dialog started.

 

JavaScript loading strategies

Tuesday, December 15th, 2009

Dec 15 This article is part of the 2009 performance advent calendar experiment. Today's article is a contribution from Ara Pehlivanian, author of two JavaScript books. Please welcome Ara and stay tuned for the articles to come.

Ara PehlivanianAra Pehlivanian has been working on the Web since 1997. He's been a freelancer, a webmaster, and most recently, a Front End Engineer at Yahoo! Ara's experience comes from having worked on every aspect of web development throughout his career, but he's now following his passion for web standards-based front-end development. When he isn't speaking and writing about best practices or coding professionally, he's either tweeting as @ara_p or maintaining his personal site at http://arapehlivanian.com/.

JavaScript has a dark side to it that not many people are aware of. It causes the browser to stop everything that it's doing until the script has been downloaded, parsed and executed. This is in sharp contrast to the other dependencies which get loaded in parallel--limited only by the number of connections the browser and server are able to create. So why is this a problem?

Good question! Before I can answer that, I need to explain how the browser goes about building a page. The first thing a it's does once it receives an HTML document from the server is to build the DOM--an object representation of the document in memory. As the browser goes about converting HTML into the DOM, it invariably encounters references to external dependencies such as CSS documents and images. Every time it does so, it fires off a request to the server for that dependency. It doesn't need to wait for one to be loaded before requesting another, it makes as many requests as it's capable of. This way, the page gets built one node at a time and as the dependencies come in, they're put in their correct placeholders. What gums up the works though, is when a JavaScript dependency is encountered. When this happens, the browser stops building the DOM and waits for that file to arrive. Once it receives the file, it parses and executes it. Only once all of that's done does the browser continue building the DOM. I suspect this has to do with wanting to provide as stable a DOM to the script as possible. If things were in flux while the script attempted to access or even modify a DOM node, things could get dicey. Either way, the time it takes before the browser can continue depends entirely on the size and complexity of the script file that's being loaded.

Now imagine loading a 200k JavaScript file right in the <head> of a document. Say it's a JavaScript file that's not only heavy but also does some fairly complex computing that takes half a second to complete. Imagine now what would happen if that file took a second to transfer. Did you guess? Yup, the page would be blank until that transfer and the computation were complete. A second and a half of a blank page that the visitor has to endure. Given that most people don't spend more than a few seconds on the average web page, that's an eternity of staring at a blank page.

Reduce

So how can this problem be overcome? Well, the first thing that should be done, is to reduce as much as possible, the amount of data that's being sent over the pipe. The smaller the JavaScript file, the less waiting the visitor has to do. So what can be done to reduce file size? JavaScript files can be run through a minifier such as YUI Compressor (which removes unnecessary white space and formatting, as well as comments, and is proven to reduce file size by 40-60%). Also, if at all possible, servers should be set up to gzip files before they're sent. This can drastically reduce the number of bytes that get transferred since JavaScript is plain text, and plain text compresses really well.

Defer

So, once you've made sure your file is as small as possible, what next? Well, the first thing is to make sure the visitor has something to look at while the script is loading. Instead of loading JavaScript files in the document's <head>, put your <script> tags immediately before your page's closing </body> tag. That way, the browser will have built the DOM and begun inserting images and applying CSS long before it encounters your script tags. This also means that your code will execute faster because it won't need to wait for the page's onload event--which only fires once all the page's dependencies are done loading.

So with the script tags placed at the end of the document, when the browser does encounter them, will still halt operations for however long it needs to, but at this point the visitor is reading your page and unaware of what's going on behind the scenes. You've just bought yourself the time to surreptitiously load your script files.

Go Async

There is another way to load JavaScript files which won't block your browser, and that's to insert the script tags into your page using JavaScript. Dynamically including a script tag into the DOM causes it to be loaded asynchronously. The only trouble with that is that you can't rely on the code within the script file to be available immediately after you've included it. What you'll need is a callback function that is executed once your script is done loading. There are several ways of doing this. A lot of libraries have built in async script loading functionality, so you're likely better off using that. But if you want to do it yourself, be ready to deal with the idiosyncrasies of different browsers. For example, where one browser will fire off an onload event for the script, another will not.

Be Lazy

So now that we know how to load scripts behind the scenes, is there anything more we can do to improve performance? Of course.

Say for example your page loads up a large script that gives your site a fancy navigation menu. What if the user never uses the navigation menu? What if they only navigate your site through links in your content? Did you really need to load that script in the first place? What if you could load the necessary code only when it was needed? You can. It's a technique called lazy loading. The principle is simple, instead of binding your fancy navigation script to the menu in your page, you'd bind a simple loader script instead. It would detect an onmouseover event for example, and then insert a script tag with the fancy nav code into the page. Once the tag is done loading, a callback function wires up all the necessary events and presto bingo, your nav menu starts working. This way, your site doesn't have to needlessly bog visitors down with code they'll never use.

Bite Size

In keeping with lazy loading, try to also load only the core components that are needed to make your page work. This is especially the case when it comes to libraries. A lot of the time a library will force you to load up a huge amount of code when all you want to do is add an event handler, or modify class names. If the library doesn't let you pull down only what you need, try ripping out what you want and only load that instead. There's no point in forcing visitors to download 60k of code when all you need is 4k of it.

Do You Need It?

Finally, the best way to speed up JavaScript load times is to not include any JavaScript at all. A lot of times people go nuts for the latest fad and include it in their site without even asking themselves if they really need it. Does this fancy accordion thing actually help my visitors get to my content easier? Does fading everything in and out and bouncing things all over the place actually improve my site's usability? So the next time you feel like adding a three dimensional spinning rainbow tag cloud to your site, ask yourself, "do I really need this?"

Note from Stoyan:

I'd like to thank Ara for the great article, it's pleasure for me to be the blog host!

Also wanted to offer some additional links for your reading pleasure:

Please comment if you can think of more good resources on the topic.

 

Free-falling waterfalls

Monday, December 14th, 2009

Dec 14 This post is part of the 2009 performance advent calendar experiment. Stay tuned for the articles to come.

In this serias of performance posts, so far we've looked at having fewer components in the waterfall (meaning less HTTP requests) and also making the components as small as possible. The next task is to make sure that the waterfall is as short as possible - meaning let it fall freely, without interruptions and have the browser download as many components as possible in parallel.

Some ways to make the waterfall fall free include having:

  • fewer DNS lookups
  • parallel downloads using several domains (this point contradicts the previous, ain't performance fun)
  • fewer redirects (ideally no redirects)
  • non-blocking scripts and styles
  • smaller request/reponse headers, which includes using fewer cookies

Reducing DNS lookups

When you request a component, the browser needs to resolve the hosthame of the component to an IP address. This is known as a DNS lookup and you can see those lookups in the waterfall charts. The DNS time may look negligible (plus DNS lookups get cached by browsers and operating systems), but they sometimes take ridiculous amount of time. It depends on many factors, often beyond your control, so the best thing to do is change what you do control and that is - require fewer DNS lookups. Carlos Bueno has an excellent writeup on DNS lookups here.

You should limit the number of DNS lookups the browser needs to perform, ideally to no more than 2 to 4, according to Yahoo's studies.

Parallel downloads

Browsers have limits on how many components they download from the same domain at the same time. In older browsers, including IE6 and IE7 this limit is 2. This can definitely slow down your waterfall significantly, when you have a greater number of components to download.

Newer browsers have increased that limit to 4 (Safari, Opera 10) or 6 (FF3, IE8), so this should be less of an issue. But at the end it depends on your page - how many components and how many people on IE6,7.

Below is an image of how IE7 loads a page with 8 images, where each image is artificially delayed to take 2 seconds. Downloading two components at a time, IE spends at least 8 seconds on the images (2-4-6-8) or a total time of over 9 seconds.

Loading the same page in IE8 is shown below. IE8 loads 6 components at a time, so the 8 images are loaded in two batches (1st - 6 images, 2nd - 2 images), for a total time of 4 seconds spend on images. Overall, the whole page loads in 5 seconds.

Now, to work around this limitation in older browsers a common technique is to create dummy subdomains, like img1.example.org, img2.example.org and so on, so that more components can be downloaded in parallel (the limitation is per domain). If you're going to do this, remember to balance this optimization with the fewer DNS lookups recommendation, don't spread on too many domains. Look carefully at your waterfalls to find the balanced point. Again, the general recommendation is that a page should require up to 2-4 domains tops.

Quick sidetrack: two URLs for your toolbox.

  1. For the up-to-the-moment insight into different browser limits and capabilities, bookmark Google's BrowserScope project. Check the "Network" tab for example to see the parallel downloads limitations across browsers
  2. Cuzillion (by Steve Souders) lets you quickly create test pages. The waterfalls above are actually coming from a page created with Cuzillion and tested in AOL's WebPageTest

No redirects

Redirects are bad for your waterfall. They do nothing for the user but just slow down the experience. Think about what happens: the browser makes a request, waits for the response, the response says "no, no, go get your component from way over there", so the poor old browser starts again - makes a new request, waits for the response.

So, avoid redirects, be they server-side redirects or client-side (JavaScript or meta-tag redirects).

As an illustration how bad redirects can be - consider the waterfall below. It's from a real page, not a made up case. So it all looks like the page could finish loading after about 1.1 seconds, but then a redirect occurs at 0.9s, takes half a second and then points to a 1x1 blank GIF. Obviously it's some sort of stats tracking image. But the bad parts are that: a/ it's an IMG tag, therefore delays onload and b/ there's a redirect. At the end, the page loads in 1.7 seconds instead of 1.1 seconds. The user experience suffers for no reason. The way to fix this is simply remove the image from the IMG tag and load it with new Image().src = "1x1.gif" this way taking it out of the onload flow. Then remove that redirect. For such stats tracking cases a 204 No Content response is the appropriate way to go.

Blocking script and styles

Scripts block downloads, hence slow down your free-falling waterfall. This is an important topic which deserves an article of its own, so stay tuned.

And what about stylesheets, do they block other downloads in the waterfall? Turns out stylesheets are mostly fine, but they could also block in these cases:

  • in Firefox before version 3 (probably no need to worry about it)
  • in all browsers, if followed by an inline script

The second one is interesting as much as it's surprising. It's probably not a good idea to have inline script tags scattered all around the HTML to begin with. And since this can cause the stylesheets to block the downloads of the other components, it should be avoided at all costs.

Be sure to check Steve Souders' blog post for more information. Credit to Steve - I believe he was the first to take note and report this issue.

So check your waterfall if you see a stylesheet that blocks, look around it in the markup for any inline script tags that can be moved further down.

Cookies and other HTTP headers

We talked about making the responses smaller. But we can also optimize the requests by making the HTTP headers smaller. You can take a look at your request and response headers and see if you're not sending too much.

Looking at my blog I see this Server header:

Server: Apache/2.2.11 (Unix) mod_ssl/2.2.11 OpenSSL/0.9.7a Phusion_Passenger/2.2.4 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635

Seems to me like too much information.

There's also an ETag:
Etag: "9f38013-2af0-453b354a20bc0"

While ETags may be help with caching, when you have far-future Expires header, they are not necessary. And if you have a multi-machine setup, ETags can actually be bad (YSlow will warn you for this issue)

While you have some control over the response headers, you have very little control over the request headers. It's your users' browsers that control them. But you do have control over the Cookie header and this is where the bigger savings will actually come from because Cookie is often the biggest part of the HTTP headers

Reduce cookies

You should aim at sending the least possible amount of cookies. Be careful when you write cookies. Make them smaller and write them to the appropriate sub-domain name. If you have a blog at blog.example.org and a main site at www.example.org, then don't write the blog.example.org cookies at *.example.org level.

Hotmail have talked about how they compress their cookies before writing them. That's also an idea if you have big Cookie headers.

Cookie-less domains for components

Better yet, for static components that don't have any use for these cookies, just don't send them. Setup static.example.org and don't write cookies for this domain. Then put your static components there.

A curious piece of stats here - Philip Dixon reported (slides) that after Shopzilla.com moved their static components to a cookie-free domain they made more money.

Images to non-cookie domain resulted in
0.5% top line revenue increase!

"top line" means revenue (maybe you know that but I had to check :)). This is a fascinating idea - that you can make more money by improving something as simple as cookie-less components. So, every little bit helps. Keep making your site faster and ... you never know.

www or no-www

This point also adds to the good old www vs. no-www flamewar. If you opt for no www, then in IE you cannot write cookies to example.org, but you'll write to *.example.org. This means your static.example.org will see all the cookies too. This post has some more info on the topic.

If you've already polluted your top-level domain with long-term cookies, the remedy would be to just buy a new "clean" domain, like examplestatic.org, never ever write cookies to it and use it for your static components.

Thanks!

And that's it for today's post, thank you for reading and may HTTP be with you ;)

 

Give PNG a chance

Sunday, December 13th, 2009

Dec 13 This post is part of the 2009 performance advent calendar experiment. Stay tuned for the articles to come.

People are often afraid to use PNG because they think that:
a/ it doesn't work in all browsers, or
b/ filesizes are bigger than GIF

While these have some grain of truth to them, they are mostly misconceptions. Before addressing them, one quick background point - what's PNG8 and why it's cool.

PNG8

There are several types of PNG files, which can be grouped into those main kinds:

  • Truecolor PNGs with or without alpha transparency channel, also known as PNG24 and/or PNG32 (the one with alpha)
  • Grayscale PNGs with or without alpha
  • Indexed PNG, aka palette PNG, aka PNG8

PNG8 is like a GIF - it has a palette of 256 colors and supports transparency. While GIF supports true/false transparency (a pixel is either transparent or it isn't), PNG8 supports variable alpha transparency. Right there you see - PNG8 can do anything that GIF can, plus more.

There's a little glitch in IE6 where a semi-transparent pixels in PNG8 are seen as fully transparent, just like a GIF. So here's an option for progressive enhancement - you use the same image and IE6 gets a degraded GIF-like experience, while modern browsers get the full experience.

Here's an example, taken from this excellent article - modern browsers get the light bulb with the glow:

IE6 and under get the gracefully degraded experience and no glow:

Another pain point is that Photoshop doesn't produce semi-transparent PNG8 (although they came up with the name "png8" instead of saying palette or indexed PNG). Only Fireworks does export alpha transparent PNG8, which makes it a bit of a challenge. You also need a good designer to undertake this tricky task of making sure the image looks OK in both experiences. One way is to assume you're working with a GIF and then upgrade the experience with the carefully selected semi-transparent pixels. It could also help you keep the gif-like version in a layer and use other layers for the semi-transparent stuff, so you can quickly preview what the image will look like in IE6.

In any event - the important thing to remember is that in the worst case (IE6) PNG8 is as good as a GIF.

PNG doesn't work in browsers?

PNG works in browsers since forever with the exception of two edge cases:

  • the glitch where PNG8's semi-transparency is gone in IE6 (see above), but here GIF can't help you either
  • transparency in truecolor PNGs is shown as a solid (usually grey) color in IE6. But again - GIF can't help here either, because it doesn't support alpha (variable) transparency to begin with. People often use GIF to "solve" this problem (moving to GIF will mean potentially losing colors), but if you can solve with a GIF, you can solve it even better (and with smaller filesize) with PNG8

Another solution to the second problem is to use IE's AlphaImageLoader CSS filter (and there's a number of scripts to do so automatically), but this filter has serious performance drawbacks and should only be used as a last resort. Three things to try before resorting to AlphaImageLoader:

  1. Try PNG8 for progressively enhanced experience
  2. Try without transparency - if the background is a solid color, convert the image to use the solid color. In imagemagick you can use -flatten for this purpose:
    $ convert source.png -flatten -background yellow result.png
  3. Forget about IE6 :)

If you end up using AlphaImageLoader, make sure you use the underscore hack so that only IE6 users experience the performance degradation.

#some-element {
    background: url(image.png);
    _background: none;
    _filter:progid:DXImageTransform.Microsoft.AlphaImageLoader(src='image.png', sizingMethod='crop');
}

PNGs are bigger than GIFs?

This misconception comes from the fact that people compare truecolor PNG with GIF which is not a fair comparison because you often compare image with thousands of colors (the PNG24) with an image with 256 colors (GIF). Often people work on an image in Photoshop or another program and when they decide to export for the web, they try PNG24, see that it's bigger and switch to GIF. But in this step GIF may strip a lot of colors. And if you're going to strip colors, well, PNG8 will give you the same colors and smaller filesize. (Another thing is that sometimes Photoshop does a poor job exporting the PNG8. If the PNG8 looks crappy, but the GIF is OK, then export as a GIF but then convert to PNG with another utility, such as optipng)

Again - PNG8 is the file format comparable to GIF and it's almost always smaller in filesize than GIF.

Comparing GIF vs. PNG filesizes

(This and the next experiment is something I did over an year ago, bored to death in the middle of the ocean on the board of a Carnival cruise ship, but since then I never really looked at the data. So here's my chance to flush some old data and clean up 20 Gigs of lets-keep-just-in-case test images :) )

Using Yahoo! image search web service I downloaded some GIFs (matching the queries "logo", "mail" and "graph"), ended up a little over 1700 images. Then I used optipng to convert them all to PNG and see the results.

I used OptiPNG simply with no special options:

$ optipng *.gif

As the next experiment will show, optipng can do better, so can pngout for example. So consider these results the least you can do to make GIFs smaller (by turning them to PNG)

So some stats from the experiment:

  • The average, median actually, GIF image on the web (last year, judging from this small sample) is 525x388 and has 139 colors (I just love semi-useless stats ;) )
  • The median GIF is 24K
  • After conversion to PNG, the median becomes 18K
  • The median savings from converting all GIFs to PNG is about 23%

Interestingly enough 4% of the images were smaller as GIFs - utter disappointment (and don't tell anyone!). So I had to try just a little harder. I didn't run OptiPNG with its best -o7 option, but ran PNGOut instead. The results is that now only 4 of the 1706 images were smaller as GIF. I'm pretty sure that trying a little harder (with PNGSlim, see yesterday's post) would've probably fixed it, but 4 out of 1700 is something I could live with. BTW, the images where OptiPNG failed to produce smaller PNG, then PNGOut converted with the ratio of 21% median savings. Not bad for taming the few shrew GIFs.

BTW, some GIFs lost over 100K of filesize, the max was over almost 600K savings! So, you never know.

If you like to look at numbers, here's a csv dump - the optipng results and the selected few that ran through PNGout.

So, take-home message: turn your GIFs into PNGs and win at minimum 20% fewer bytes over the network.

Comparing PNG optimizers

For this experiment I downloaded over 12000 images (again, Yahoo! search API) and ran them through a bunch of optimizers, sometimes with different options. In retrospect, it's probably not that useful of an experiment, because (see previous post) different optimizers specialize in different areas - compression, pre-compression filtering, chunks removal, etc, and your best bet is to run several tools. But still it's at least some data points (the cvs dump is here)

The images were 1000 matches for each of the searches for "baby", "background", "bkg", "flower", "graph", "graphic", "icon", "illustration", "kittens" (of course), "logo", "monkeys", "png", "transparency". After removing 4xxs, 5xxs and other mishaps and cleaning up a bit, I ended with over 10000 images. I ran them thorough:

  • pngcrush - pngcrush -rem alla -reduce before.png after.png
  • pngcrush-none - to keep all chunks pngcrush -rem none -reduce before.png after.png
  • pngcrush-brute - more filter attempts - pngcrush -rem alla -brute -reduce before.png after.png
  • pngout - pngout /q /y /force before.png after.png. default compression level in PNGOut is "extreme", so I tried two less extreme below
  • pngout-match - pngout /s2 /q /y /force before.png after.png
  • pngout-intense - pngout /s1 /q /y /force before.png after.png
  • pngrewrite - pngrewrite before.png after.png PNGRewrite only works with PNG8, it also converts truecolor to PNG8 whenever the truecolor happens to be under 256 colors,
  • optipng - optipng before.png -force -out after.png. OptiPNG's default level is 2 (of 7) so I had to try below and above the default:
  • optipng1 - optipng before.png -o1 -force -out after.png
  • optipng3 - optipng before.png -o3 -force -out after.png
  • optipng7 - optipng before.png -o7 -force -out after.png
  • advpng - cp before.png after.png; advpng -z -f -q after.png
  • advpng-insane - with the "insane" 4th level of compression cp before.png after.png; advpng -z4 -f -q after.png
  • deflopt - cp before.png after.png; deflopt -s -f after.png
  • pngoptimizercl -cp before.png after.png; pngoptimizercl -file:"after.png"

And the results:

Tool Median time to run Median savings Success rate
pngcrush 0.25s 6.06% 93.85%
pngcrush-none 0.23s 5.58% 90.22%
pngcrush-brute 3.08s 8.10% 96.31%
pngout 1.89s 12.21% 94.35%
pngout-match 0.22s 13.89% 44.57%
pngout-intense 1.63s 12.10% 94.22%
pngrewrite 0.07s 29.84% 22.37%
optipng 0.23s 7.32% 93.21%
optipng1 0.10s 4.24% 85.16%
optipng3 0.66s 7.10% 94.26%
optipng7 4.13s 7.57% 94.81%
advpng 0.34s 11.55% 52.47%
advpng-insane 0.76s 15.64% 56.09%
deflopt 0.34s 0.44% 96.94%
pngoptimizercl 0.48s 9.71% 97.99%

"Success rate" is how often the tool managed to produce a smaller result than the original. For example PNGRewrite's success rate is pretty low, because it only works with up to 256 colors. Median time to run is the median value that the tool takes to optimize one image.

And now, madames et monsieurs, introducing...

Give PNG a chance (.com)

I hope you'll find this as funny as I do, I thought it was pretty funny, at least in my head :)

My secret goal was that everybody who hears the song or watches the video, will think twice the next time when doing "Save for the Web..." in Photoshop.

Enjoy!

Music: Drums from GarageBand, I play two guitars, also bass (a guitar with effect actually) and vocals. If you think you hear a woman's voice, it's still me, with "Helium" effect. The MP3 is here. If you want to experiment with the song yourself, here's a zip with each channel as an MP3.

Video: It may be lousy, but it's all web dev :) It's all JavaScript and CSS. The video is a screen capture of the Safari window. Also there are no images, only HTML entities. Heavy use of -webkit-* animations and transitions. The source and a live version you can play in Safari is here. The StarWars-like effect is borrowed from here.

The http://givepngachance.com URL is currently pretty blank, but I intend to add more PNG-related stuff there. Oh, and the lyrics.

Thanks!

Thanks for reading. And watching. And listening. Peace. And give PNG a chance :)

 

Big list of image optimization tools

Saturday, December 12th, 2009

Dec 12 This post is part of the 2009 performance advent calendar experiment (12 articles down, 12 more to go). Stay tuned for the articles to come.

Let's continue the topic of reducing file sizes started with the previous post and talk about making images smaller.

Engineer's guide to smaller images

Just to set the frame of the discussion - this is not about using Photoshop or setting the quality of the JPEGs and so on. I realize that we, web developers, wear many hats - we're designers, client/server coders, Apache/Linux admins, database heros. But this post is not about using image programs and assumes that you or your designer has already created the images to be used on the site with the appropriate colors, quality and so on.

Now, repeat after me: you should never take an image from the designer and put it up on the web.

Most often this image is bigger than it should be. It's not the designer's fault, it's usually the software used to produce the image.

You shouldn't put an image up on your server before running it through a few tools. These tools are free, open source, cross-platform and can be run on the command line, hence scripted and run in batches over a large number of files - by you, or even better, automatically by a build deployment process. It's OK to batch-run those files without human intervention, because these tools simply optimize the files, they don't change the pixel information, so the "after" images look exactly like "before", only smaller.

Selecting the right file format

The first step towards leaner images is to select the correct file format. There are three options:

  1. JPEG for photos. Photos contain millions of colors and smooth transitions of colors. Blue skies, clouds, sunset, your dog, lolcats - all photos.
  2. GIF is for the occasional "loading..." progress animation. This is it, no other uses for GIFs.
  3. PNG is for everything that's not a photo or an animation. That includes all icons, graphs, buttons, gradients, and what not. Any image with sharp transitions of colors. Think (but don't use for) text. Sharp transitions become "dirty" in JPEG.

PNG is an interesting topic for a follow up post, for now let's stop here. If it's not a photo, it should be PNG. An edge case is a screenshot for example. Depends on what's on the screen of course, but most often JPEG will give a smaller size if you can live with the artifacts around the sharp edges (like text).

Optimizing GIFs

Unfortunately many people still use GIFs even for non-animated images. That's a mistake. PNG is a superior format and yields smaller file sizes.

People still use GIFs because they think either that a/ GIF is smaller than PNG or b/ there's lack of support for PNG in browsers. These are misconceptions and I'll talk more about them tomorrow.

So, the way to optimize a GIF is to convert it to PNG.

You can use many tools to turn your GIFs to PNGs, including ImageMagick and OptiPNG.

# option 1: ImageMagick (if you know the filename)
$ convert logo.gif logo.png

# option 2: ImageMagick again (if you just convert all files in a directory)
$ mogrify -format png *.gif

# option 3: OptiPNG
$ optipng *.gif

These are just some of the options, I'm sure there are others.

After the conversion you can optimize the new PNGs like all other PNGs

Optimizing PNGs

There are various ways to write a PNG file. Unfortunately not all image editing programs do a good job at writing PNGs for the minimal file size.

Luckily, to fill the void, there's a great number of tools that excel in writing small PNGs. There are different ways to optimize a PNG:

  1. Stripping out "chunks" - PNG is an extensible format. Extensions come in the form of chunks and most chunks are not needed for the web.
  2. Reducing the number of colors and switching between PNG types - truecolor PNG, grayscale, palette...
  3. Chosing the best "filter". Filters are a pre-compression step. You can compress any type of file, but when you know the file is an image, you can do better. Filters are for this purpose.
  4. Optimizing the actual DEFLATE compression algorithm

Different tools specialize in one or more of these areas. So the more tools you run, the better the results will be. But you have to run at least one tool, always. You'll be surprised how unoptimized are most PNGs coming from common commercial image programs.

So, to optimize a PNG you shoul run as many of the following programs as possible:

# optipng (skip -o7 to run faster)
$ optipng -o7 my.png

# pngcrush (skip -brute to run faster)
$ pngcrush -rem alla -brute -reduce my.png my.png.temp
$ mv my.png.temp my.png

# pngout - closed source, non-windows binaries here
# (add parameter -s2 to run faster)
$ pngout my.png

# advpng (use -z2 to run faster)
$ advpng -z4 my.png

# deflopt - windows only
$ deflopt my.png

Other tools to note include PNGrewrite, PNGNQ and PNGquant, but they are limited because they deal only with PNG8 (256 colors) files. PNGNQ and PNGQuant are actually converters from truecolor to PNG8, so they are not guaranteed to be lossless. PNGreqwrute is safe to use, it will just silently fail if the file has more than 256 colors, so there's nothing to lose.

Oh, and another, excellent tool - PNGOptimizer, windows-only has both command line interface and a GUI.

PNGSlim for the hardcore PNG optimization

If you're really serious about optimizing your PNGs, the tool is called PNGSlim. It's a Windows-only batch file that runs pretty much all tools above and runs them (especially PNGOut) with all kinds of parameters, hundreds of times. So it can take a while to run.

Optimizing JPEGs

JPEG is a lossy format (you lose information every time you save it, even if you choose 100% quality), but there are some operations that can be done losslessly - such as tweaking comments and meta information, cropping, rotating to 90, 180, 270 degrees. The tool that does this magic is called JPEGTran and is likely already on your unix/linux box. If not - here's how to install it (for Windows - get the .exe here)

So to optimize a JPEG losslessly you remove the meta information and optimize the so called Huffman tables. For bigger JPEGs (bigger than 10K) you can also convert the image to progressive coding.

# strip meta and optimize
$ jpegtran -copy none source.jpg > destination.jpg

# strip meta and convert to progressive coding
$ jpegtran -copy none -progressive source.jpg > destination.jpg

# keep all meta but still optimize
$ jpegtran -copy all source.jpg > destination.jpg

Important note on stripping meta

Only strip meta information from images you own the rights for and have permission. Otherwise you're committing a crime. Photographers put important copyright information in meta markers.

Optimizing GIF animations

Remember - no GIFs other than animations. For animations, run GIFsicle (pronounced "yo' mama" :) ) berfore you put them up:

# GIFSicle
$ gifsicle -O2 source.gif > destination.gif

More tools?

These are the core tools for image optimization. There's a number of wrapper tools that are more or less UIs on top of these, because there are people who don't like consoles (really?!). I'll list the few that I can think of, please comment if you know of others, especially for windows. It's nice to give nice UIs to give to designers so they can drag-drop optimize images too.

  • smush.it, created by yours truly and Nicole Sullivan, now part of YSlow - runs pngcrush, jpegtran, gifsicle
  • PageSpeed runs optipng, jpegtran
  • PunyPNG - originally inspired by smush.it, but more advanced
  • ImageOptim - Nice easy UI for Mac, runs most of the tools above (hope your company firewall doesn't block the site because of the domain name ;))
  • PNGSquash another UI for Mac, runs advpng, pngcrush, optipng
  • PNG Monster is for Windows, runs many PNG tools, you can drag/drop on it
  • IrfanView - my favorite image viewer for Windows has a plugin to use PNGOut
  • WP-Smushit is a WordPress plugin by Alex Dunae which sends all your image uploads to smush.it for optimization. Talk about easy!

More reading

Series of articles on YUIBlog:

presentations:

and more:

Thanks!

Thank you for reading. Now you have a whole lot of tools/toys to install and play with. Image optimization is an easy way to improve performance, it's just running a bunch of tools. You don't need to worry that the quality will suffer (so the designer won't be disappointed in you :)). So you can only win. You may win quite a bit, you may win just a little (anywhere between 5 to 30% savings is what I've seen on random live sites when I was working on and testing smush.it). The thing is you'll almost always win something.

And because it's human to forget to optimize the images before you push them live, do take the time to setup the optimization step as part of the automated deployment process.

And to summarize once again the steps of what this automated process would be:

  1. Convert GIFs to PNG. Then relax and take a deep breath (instead of flaming the unfortunate soul who created the GIFs) while you string replace "gif" with "png" in all your styleshets
  2. Run PNGs through optipng, pngcrush, pngout, any or all of the tools listed above
  3. Run JPEGtran
  4. Run GIFsicle